我相信所有必要的部分都可以演示内核写入 3D 表面(绑定到底层 3D cudaArray),然后是来自相同数据(绑定到相同底层 3D 的 3D 纹理)的另一个内核纹理(即自动插值) cudaArray)包含在体积过滤 CUDA 示例代码 http://docs.nvidia.com/cuda/cuda-samples/index.html#volumetric-filtering-with-3d-textures-and-surface-writes.
唯一的概念差异是示例代码有 2 个不同的底层 3D cudaArray(一个用于纹理,一个用于表面),但我们可以将它们组合起来,以便随后在纹理操作期间读取写入表面的数据。
这是一个完整的示例:
$ cat texsurf.cu
#include <stdio.h>
#include <helper_cuda.h>
texture<float, cudaTextureType3D, cudaReadModeElementType> volumeTexIn;
surface<void, 3> volumeTexOut;
__global__ void
surf_write(float *data,cudaExtent volumeSize)
{
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
int z = blockIdx.z*blockDim.z + threadIdx.z;
if (x >= volumeSize.width || y >= volumeSize.height || z >= volumeSize.depth)
{
return;
}
float output = data[z*(volumeSize.width*volumeSize.height)+y*(volumeSize.width)+x];
// surface writes need byte offsets for x!
surf3Dwrite(output,volumeTexOut,x * sizeof(float),y,z);
}
__global__ void
tex_read(float x, float y, float z){
printf("x: %f, y: %f, z:%f, val: %f\n", x,y,z,tex3D(volumeTexIn,x,y,z));
}
void runtest(float *data, cudaExtent vol, float x, float y, float z)
{
// create 3D array
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
cudaArray_t content;
checkCudaErrors(cudaMalloc3DArray(&content, &channelDesc, vol, cudaArraySurfaceLoadStore));
// copy data to device
float *d_data;
checkCudaErrors(cudaMalloc(&d_data, vol.width*vol.height*vol.depth*sizeof(float)));
checkCudaErrors(cudaMemcpy(d_data, data, vol.width*vol.height*vol.depth*sizeof(float), cudaMemcpyHostToDevice));
dim3 blockSize(8,8,8);
dim3 gridSize((vol.width+7)/8,(vol.height+7)/8,(vol.depth+7)/8);
volumeTexIn.filterMode = cudaFilterModeLinear;
checkCudaErrors(cudaBindSurfaceToArray(volumeTexOut,content));
surf_write<<<gridSize, blockSize>>>(d_data, vol);
// bind array to 3D texture
checkCudaErrors(cudaBindTextureToArray(volumeTexIn, content));
tex_read<<<1,1>>>(x, y, z);
checkCudaErrors(cudaDeviceSynchronize());
cudaFreeArray(content);
cudaFree(d_data);
return;
}
int main(){
const int dim = 8;
float *data = (float *)malloc(dim*dim*dim*sizeof(float));
for (int z = 0; z < dim; z++)
for (int y = 0; y < dim; y++)
for (int x = 0; x < dim; x++)
data[z*dim*dim+y*dim+x] = z*100+y*10+x;
cudaExtent vol = {dim,dim,dim};
runtest(data, vol, 1.5, 1.5, 1.5);
runtest(data, vol, 1.6, 1.6, 1.6);
return 0;
}
$ nvcc -I/usr/local/cuda/samples/common/inc texsurf.cu -o texsurf
$ cuda-memcheck ./texsurf
========= CUDA-MEMCHECK
x: 1.500000, y: 1.500000, z:1.500000, val: 111.000000
x: 1.600000, y: 1.600000, z:1.600000, val: 122.234375
========= ERROR SUMMARY: 0 errors
$
我不会尝试在这里提供有关线性纹理过滤的完整教程。这里还有很多其他示例问题,涵盖了索引和过滤的细节,但这似乎不是这个问题的关键。我选择了点 (1.5, 1.5, 1.5) 和 (1.6, 1.6, 1.6) 以便于验证基础数据;结果对我来说很有意义。