当前位置：首页 > news >正文

cuda编程笔记（4）--纹理内存

news 2025/9/16 3:04:04

CUDA 中的 纹理内存（Texture Memory） 是一种专门为图像处理优化的内存访问方式，它使用了 专用的缓存机制（Texture Cache） 来提升内存访问效率，尤其在图像处理、科学计算中，访问图像数据或二维数据区域时表现优异。

纹理内存并不是一种“新的内存”，而是对显存（全局内存）中数据的一种特殊读取方式，提供了更高效的数据访问模式（带缓存 + 插值支持），特别适合处理：

图像、视频等 二维或三维数据
有 空间局部性 的数据访问（相邻线程访问相邻数据）

特性	说明
缓存机制	纹理读取使用专用的 texture cache，速度通常快于直接访问 global memory。
插值功能	支持硬件加速的线性插值、坐标映射等功能（可配置）
只读	对于设备代码（kernel），纹理内存通常是只读的
空间局部性优化	纹理缓存根据访问模式优化，例如二维线程块访问二维图像
兼容性	既支持 1D、2D、3D，也支持绑定 CUDA array 或 linear memory

纹理内存的优势：

如果线程访问的数据具有空间局部性（相邻线程访问相邻地址），纹理缓存就能大幅提升带宽利用率。
如果你想要硬件插值（比如图像中采样某个像素的中间值），纹理提供天然支持。
在访问二维、三维结构化数据（图像、网格）时，纹理更适合表达访问意图。

不推荐纹理内存的情况

数据随机访问、不具局部性。
需要在 device 上读写。
数据频繁更新（因为更新后要重新 bind）。

通常使用以下两个方式访问纹理内存：

传统 Bind Texture API（不推荐，我都用不了了）

一、纹理类型声明（texture<>)

template <typename T, int dim, cudaReadMode mode = cudaReadModeElementType>
texture<T, dim, mode> texRef;

参数	含义
`T`	数据类型（如 `float`、`uchar4` 等）
`dim`	纹理维度：1（一维）、2（二维）、3（三维）
`mode`	读取模式：`cudaReadModeElementType`（返回原值），或 `cudaReadModeNormalizedFloat`（转为 float 归一化）

texture 在 CUDA 中看起来像模板，其实不是标准 C++ 模板类型。

示例：

texture<float, 1, cudaReadModeElementType> tex1D;     // 一维 float
texture<uchar4, 2, cudaReadModeNormalizedFloat> tex2D; // 二维 uchar4，自动归一化为 float

归一化规则如下：

原始类型	读取结果（float）	归一化范围
`uchar` / `uchar4`	转成 float / float4	[0.0, 1.0]
`char` / `char4`	转成 float / float4	[-1.0, 1.0]
`ushort`	[0.0, 1.0]
`short`	[-1.0, 1.0]
`float`	无变化（返回原 float）	—

二、资源绑定函数（在 host 端绑定内存）

绑定将纹理变量与实际数据建立连接，有两种常用情况：

1. 绑定线性内存（一维）

cudaBindTexture(NULL, texRef, devPtr, size);

参数	含义
`NULL`	offset，一般设置为 NULL
`texRef`	要绑定的纹理变量
`devPtr`	设备内存指针（来自 `cudaMalloc`）
`size`	要绑定的数据大小（以字节为单位）

2. 绑定二维内存（使用 pitch）

cudaBindTexture2D(NULL, texRef, devPtr, desc, width, height, pitch);

参数	含义
`desc`	通道描述符（数据格式）
`width`, `height`	数据宽高
`pitch`	使用 `cudaMallocPitch` 返回的 pitch 值
`devPtr`	设备二维数组起始指针

创建通道描述符的函数：

cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();

三、在 kernel 中读取纹理

维度	读取函数	示例
1D	`tex1D(texRef, x)`	`float val = tex1D(texRef, x);`
2D	`tex2D(texRef, x, y)`	`float val = tex2D(texRef, x, y);`
3D	`tex3D(texRef, x, y, z)`	`float val = tex3D(texRef, x, y, z);`

注意：

x, y, z 是 float 类型坐标，不是 int 下标（使用 tex2D 时）。
坐标是以像素为单位的中心点。例如访问 tex2D(texRef, 0.5f, 0.5f) 通常表示第一行第一列。

CUDA 遵循图形硬件常用的采样规范：默认将 (i + 0.5, j + 0.5) 作为第 i 行第 j 列像素的中心坐标。

四、解绑纹理（可选）

cudaUnbindTexture(texRef);

示例代码（完整）

#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <iostream>// 声明二维纹理
texture<float, 2, cudaReadModeElementType> texRef;// CUDA kernel，读取纹理
__global__ void texKernel(float* output, int width, int height) {int x = threadIdx.x + blockDim.x * blockIdx.x;int y = threadIdx.y + blockDim.y * blockIdx.y;if (x < width && y < height) {float val = tex2D(texRef, x + 0.5f, y + 0.5f); // 注意坐标是 floatoutput[y * width + x] = val;}
}int main() {const int width = 512, height = 512;// 分配 host 和 device 内存float* h_data = new float[width * height];float* d_data;cudaMalloc(&d_data, width * height * sizeof(float));// 初始化 host 数据for (int i = 0; i < width * height; ++i) h_data[i] = float(i);// 设备内存和 pitch 分配float* d_texData;size_t pitch;cudaMallocPitch(&d_texData, &pitch, width * sizeof(float), height);cudaMemcpy2D(d_texData, pitch, h_data, width * sizeof(float),width * sizeof(float), height, cudaMemcpyHostToDevice);// 创建通道描述符cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();// 绑定纹理cudaBindTexture2D(NULL, texRef, d_texData, desc, width, height, pitch);// 启动 kerneldim3 block(16, 16);dim3 grid((width + 15) / 16, (height + 15) / 16);texKernel << <grid, block >> > (d_data, width, height);// 拷贝回 host 验证cudaMemcpy(h_data, d_data, width * height * sizeof(float), cudaMemcpyDeviceToHost);std::cout << "Sample: " << h_data[100] << std::endl;// 清理cudaUnbindTexture(texRef);cudaFree(d_texData);cudaFree(d_data);delete[] h_data;return 0;
}

2. 现代 CUDA：使用 `cudaTextureObject_t`

1. `cudaCreateTextureObject`

cudaError_t cudaCreateTextureObject(cudaTextureObject_t *pTexObject,const struct cudaResourceDesc *pResDesc,const struct cudaTextureDesc *pTexDesc,const struct cudaResourceViewDesc *pResViewDesc);

参数	说明
`pTexObject`	输出：返回创建的纹理对象
`pResDesc`	指定绑定的内存资源（如数组或线性内存）
`pTexDesc`	指定采样模式（过滤、边界处理、是否归一化）
`pResViewDesc`	可选资源视图设置（通常设为 `nullptr`）

2. `cudaDestroyTextureObject`

cudaError_t cudaDestroyTextureObject(cudaTextureObject_t texObj);

释放纹理对象资源。

3. 在 CUDA device 代码中使用：

__device__ float tex2D(cudaTextureObject_t tex, float x, float y);

注意 tex2D() 是 device 代码中的函数，但输入参数是 cudaTextureObject_t 类型。

cudaArray_t

CUDA现代纹理对象的设计是抽象的，它支持多种资源类型，但底层绑定到纹理对象的资源有两种主要类型：

cudaArray —— 专门用于纹理/表面（texture/surface）访问的特殊内存结构，支持硬件纹理缓存和高效采样，通常用于二维、三维纹理。
线性内存（linear memory） —— 由 cudaMalloc 分配的一般全局内存。

为什么大多数情况下需要 `cudaArray`

硬件优化支持：二维、三维纹理访问需要高效的硬件缓存、滤波和边界处理，这些功能只能用 cudaArray 支持的内存类型。
支持二维或三维布局：cudaArray 是为二维/三维纹理设计的特殊内存，保证内存布局紧凑，适合纹理硬件快速寻址。
API要求：很多 cudaResourceDesc 绑定纹理时，二维纹理绑定必须是 cudaArray，不支持直接绑定普通线性内存（除非用特定方式创建纹理对象）。

cudaChannelFormatDesc

cudaChannelFormatDesc cudaCreateChannelDesc<Type>();

作用：描述纹理元素中每个通道的数据格式，比如 float、uchar4、int2 等。
用途：告诉 CUDA 纹理硬件每个像素的数据类型和格式。
为什么必须：CUDA需要知道数据的内存排布，保证正确读取。

cudaMallocArray

cudaError_t cudaMallocArray(cudaArray_t *array, const cudaChannelFormatDesc *desc, size_t width, size_t height = 0, unsigned int flags = 0);

作用：分配二维或三维的 CUDA 纹理专用数组 cudaArray_t。
参数：
- array：返回分配的数组句柄。
- desc：通道格式描述。
- width：纹理宽度（像素数）。
- height：纹理高度（像素数），默认0表示一维。
- flags：保留，通常0。
内存特点：
- 不能用普通指针访问，只能通过 CUDA API 或纹理对象访问。
- 内存布局优化，支持硬件滤波器。
为什么必须：
- 绑定二维纹理时，硬件和驱动只支持绑定 cudaArray_t。
- 线性内存无法提供高效的二维采样。

cudaMemcpy2DToArray

cudaError_t cudaMemcpy2DToArray(cudaArray_t dst, size_t wOffset, size_t hOffset,const void *src, size_t spitch,size_t width, size_t height,cudaMemcpyKind kind);

作用：将二维数据从主机或设备内存复制到 CUDA 数组中。
参数：
- dst：目标 CUDA array。
- wOffset, hOffset：二维拷贝起始偏移。
- src：源内存地址。
- spitch：源行跨度（字节数）。
- width, height：拷贝区域尺寸（像素或元素数）。
- kind：拷贝方向，如 cudaMemcpyHostToDevice。
为什么必须：
- 只能通过专用函数将数据拷贝到 cudaArray。
- 直接用 cudaMemcpy 无法拷贝到 cudaArray。

cudaResourceDesc

cudaResourceDesc 是一个结构体，用来描述绑定给纹理对象（cudaTextureObject_t）的底层资源。它告诉 CUDA 纹理对象：

资源类型 是什么（数组？线性内存？Mipmapped数组？）
资源的具体数据指针或句柄
资源内存的具体布局和格式（必要时）

简易版定义

struct cudaResourceDesc {enum cudaResourceType resType;             /**< Resource type */union {struct {cudaArray_t array;                 /**< CUDA array */} array;struct {cudaMipmappedArray_t mipmap;       /**< CUDA mipmapped array */} mipmap;struct {void *devPtr;                      /**< Device pointer */struct cudaChannelFormatDesc desc; /**< Channel descriptor */size_t sizeInBytes;                /**< Size in bytes */} linear;struct {void *devPtr;                      /**< Device pointer */struct cudaChannelFormatDesc desc; /**< Channel descriptor */size_t width;                      /**< Width of the array in elements */size_t height;                     /**< Height of the array in elements */size_t pitchInBytes;               /**< Pitch between two rows in bytes */} pitch2D;} res;
};

字段名	说明
`resType`	资源类型，枚举值，如 `cudaResourceTypeArray`、`cudaResourceTypeLinear`、`cudaResourceTypeMipmappedArray` 等。决定联合体中哪个字段生效。
`array.array`	如果资源是二维或三维纹理专用的 CUDA array，存放其句柄。
`linear.devPtr`	如果资源是线性内存（通过 `cudaMalloc` 分配），指向该内存指针。
`linear.desc`	线性内存的通道格式描述（格式、通道数、数据类型）。
`linear.sizeInBytes`	线性内存大小，字节单位。
`mipmap.mipmap`	多级渐远纹理（mipmapped array）指针。

typedef enum cudaResourceType {cudaResourceTypeArray = 0,cudaResourceTypeMipmappedArray = 1,cudaResourceTypeLinear = 2,cudaResourceTypePitch2D = 3
} cudaResourceType;

cudaResourceTypeArray：资源是 cudaArray_t。
cudaResourceTypeMipmappedArray：资源是 mipmap 纹理。
cudaResourceTypeLinear：资源是线性内存。
cudaResourceTypePitch2D：资源是 pitched（有行跨度）二维线性内存。

举个最常见的二维纹理绑定例子：

cudaResourceDesc resDesc = {};
resDesc.resType = cudaResourceTypeArray;    // 资源类型是cudaArray
resDesc.res.array.array = cuArray;           // 绑定之前分配好的cudaArray句柄

cudaTextureDesc

cudaTextureDesc 是一个结构体，用来描述纹理对象的行为属性，例如：

纹理访问时是否进行归一化坐标转换
过滤模式（如何插值采样）
地址模式（纹理坐标越界时如何处理）
读取时是否使用线性插值
是否启用边界颜色等

简单说，它定义了纹理访问时的“采样行为”和“边界处理”。

struct __device_builtin__ cudaTextureDesc
{/*** Texture address mode for up to 3 dimensions*/enum cudaTextureAddressMode addressMode[3];/*** Texture filter mode*/enum cudaTextureFilterMode  filterMode;/*** Texture read mode*/enum cudaTextureReadMode    readMode;/*** Perform sRGB->linear conversion during texture read*/int                         sRGB;/*** Texture Border Color*/float                       borderColor[4];/*** Indicates whether texture reads are normalized or not*/int                         normalizedCoords;/*** Limit to the anisotropy ratio*/unsigned int                maxAnisotropy;/*** Mipmap filter mode*/enum cudaTextureFilterMode  mipmapFilterMode;/*** Offset applied to the supplied mipmap level*/float                       mipmapLevelBias;/*** Lower end of the mipmap level range to clamp access to*/float                       minMipmapLevelClamp;/*** Upper end of the mipmap level range to clamp access to*/float                       maxMipmapLevelClamp;/*** Disable any trilinear filtering optimizations.*/int                         disableTrilinearOptimization;/*** Enable seamless cube map filtering.*/int                         seamlessCubemap;
};

字段名	说明	取值示例
`addressMode[3]`	三个维度（x,y,z）的地址模式，决定纹理坐标越界如何处理	`cudaAddressModeWrap` (循环)
		`cudaAddressModeClamp` (钳制)
		`cudaAddressModeMirror` (镜像)
`filterMode`	纹理过滤模式，决定采样时是否做线性插值	`cudaFilterModePoint` (最近邻采样)
		`cudaFilterModeLinear` (线性插值)
`readMode`	纹理读取时数据如何转换	`cudaReadModeElementType`（直接读取）
		`cudaReadModeNormalizedFloat`（归一化到[0,1]）
`normalizedCoords`	是否使用归一化纹理坐标（坐标范围0~1）	0（不归一化，坐标直接对应像素索引）
		1（归一化）
`sRGB`	是否启用sRGB颜色空间转换	0或1
`maxAnisotropy`	各向异性过滤最大等级，通常设置为1（关闭）	1或更高
`mipmapFilterMode`	Mipmap过滤模式	同`filterMode`
`mipmapLevelBias`	Mipmap采样时的层级偏差	float值
`minMipmapLevelClamp`	Mipmap采样的最小层级限制	float值
`maxMipmapLevelClamp`	Mipmap采样的最大层级限制	float值

二维的一般设置

struct cudaTextureDesc texDesc = {};
texDesc.addressMode[0] = cudaAddressModeClamp;
texDesc.addressMode[1] = cudaAddressModeClamp;
texDesc.filterMode = cudaFilterModePoint; // 也可以是 cudaFilterModeLinear
texDesc.readMode = cudaReadModeElementType;
texDesc.normalizedCoords = 0; // 使用非归一化坐标

示例代码（完整）

#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <iostream>// 核函数
__global__ void readTextureKernel(cudaTextureObject_t texObj, float* output, int width, int height) {int x = threadIdx.x + blockIdx.x * blockDim.x;int y = threadIdx.y + blockIdx.y * blockDim.y;if (x < width && y < height) {float u = x + 0.5f; // 中心采样float v = y + 0.5f;output[y * width + x] = tex2D<float>(texObj, u, v);}
}int main() {const int width = 4, height = 4;size_t size = width * height * sizeof(float);// 创建 host 数据float h_data[width * height];for (int i = 0; i < width * height; ++i) h_data[i] = i;// 分配并复制 device 数据float* d_data;cudaMalloc(&d_data, size);cudaMemcpy(d_data, h_data, size, cudaMemcpyHostToDevice);// 创建 CUDA array（必须）cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();cudaArray_t cuArray;cudaMallocArray(&cuArray, &channelDesc, width, height);cudaMemcpy2DToArray(cuArray, 0, 0, h_data, width * sizeof(float), width * sizeof(float), height, cudaMemcpyHostToDevice);// 设置资源描述符struct cudaResourceDesc resDesc = {};resDesc.resType = cudaResourceTypeArray;resDesc.res.array.array = cuArray;// 设置纹理描述符struct cudaTextureDesc texDesc = {};texDesc.addressMode[0] = cudaAddressModeClamp;texDesc.addressMode[1] = cudaAddressModeClamp;texDesc.filterMode = cudaFilterModePoint; // 也可以是 cudaFilterModeLineartexDesc.readMode = cudaReadModeElementType;texDesc.normalizedCoords = 0; // 使用非归一化坐标// 创建纹理对象cudaTextureObject_t texObj = 0;cudaCreateTextureObject(&texObj, &resDesc, &texDesc, nullptr);// 输出缓冲区float* d_output;cudaMalloc(&d_output, size);// 启动核函数dim3 block(8, 8);dim3 grid((width + 7) / 8, (height + 7) / 8);readTextureKernel << <grid, block >> > (texObj, d_output, width, height);// 拷回结果float h_output[width * height];cudaMemcpy(h_output, d_output, size, cudaMemcpyDeviceToHost);// 打印结果for (int i = 0; i < width * height; ++i)std::cout << h_output[i] << " ";std::cout << std::endl;// 清理cudaDestroyTextureObject(texObj);cudaFreeArray(cuArray);cudaFree(d_data);cudaFree(d_output);return 0;
}

查看全文

http://www.lryc.cn/news/571325.html