这是一个工作示例,显示了纹理对象数组的创建,大致遵循您提供的代码的路径。你可以看到,通过与我放置的纹理参考代码进行比较here,从第一个纹理对象(即第一个内核调用)读取的第一组纹理与从纹理参考示例读取的一组纹理的数值相同(您可能需要调整两个示例代码的网格大小以匹配)。
纹理对象的使用需要计算能力 3.0 或更高。
example:
$ cat t507.cu
#include <helper_cuda.h>
#include <curand.h>
#define NUM_TEX 4
const int SizeNoiseTest = 32;
const int cubeSizeNoiseTest = SizeNoiseTest*SizeNoiseTest*SizeNoiseTest;
static cudaTextureObject_t texNoise[NUM_TEX];
__global__ void AccesTexture(cudaTextureObject_t my_tex)
{
float test = tex3D<float>(my_tex,(float)threadIdx.x,(float)threadIdx.y,(float)threadIdx.z);//by using this the error occurs
printf("thread: %d,%d,%d, value: %f\n", threadIdx.x, threadIdx.y, threadIdx.z, test);
}
void CreateTexture()
{
float *d_NoiseTest;//Device Array with random floats
cudaMalloc((void **)&d_NoiseTest, cubeSizeNoiseTest*sizeof(float));//Allocation of device Array
for (int i = 0; i < NUM_TEX; i++){
//curand Random Generator (needs compiler link -lcurand)
curandGenerator_t gen;
curandCreateGenerator(&gen,CURAND_RNG_PSEUDO_DEFAULT);
curandSetPseudoRandomGeneratorSeed(gen,1235ULL+i);
curandGenerateUniform(gen, d_NoiseTest, cubeSizeNoiseTest);//writing data to d_NoiseTest
curandDestroyGenerator(gen);
//cudaArray Descriptor
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
//cuda Array
cudaArray *d_cuArr;
checkCudaErrors(cudaMalloc3DArray(&d_cuArr, &channelDesc, make_cudaExtent(SizeNoiseTest*sizeof(float),SizeNoiseTest,SizeNoiseTest), 0));
cudaMemcpy3DParms copyParams = {0};
//Array creation
copyParams.srcPtr = make_cudaPitchedPtr(d_NoiseTest, SizeNoiseTest*sizeof(float), SizeNoiseTest, SizeNoiseTest);
copyParams.dstArray = d_cuArr;
copyParams.extent = make_cudaExtent(SizeNoiseTest,SizeNoiseTest,SizeNoiseTest);
copyParams.kind = cudaMemcpyDeviceToDevice;
checkCudaErrors(cudaMemcpy3D(©Params));
//Array creation End
cudaResourceDesc texRes;
memset(&texRes, 0, sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = d_cuArr;
cudaTextureDesc texDescr;
memset(&texDescr, 0, sizeof(cudaTextureDesc));
texDescr.normalizedCoords = false;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp; // clamp
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
checkCudaErrors(cudaCreateTextureObject(&texNoise[i], &texRes, &texDescr, NULL));}
}
int main(int argc, char **argv)
{
CreateTexture();
AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[0]);
AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[1]);
AccesTexture<<<1,dim3(2,2,2)>>>(texNoise[2]);
checkCudaErrors(cudaPeekAtLastError());
checkCudaErrors(cudaDeviceSynchronize());
return 0;
}
编译:
$ nvcc -arch=sm_30 -I/shared/apps/cuda/CUDA-v6.0.37/samples/common/inc -lcurand -o t507 t507.cu
output:
$ cuda-memcheck ./t507
========= CUDA-MEMCHECK
thread: 0,0,0, value: 0.310691
thread: 1,0,0, value: 0.627906
thread: 0,1,0, value: 0.638900
thread: 1,1,0, value: 0.665186
thread: 0,0,1, value: 0.167465
thread: 1,0,1, value: 0.565227
thread: 0,1,1, value: 0.397606
thread: 1,1,1, value: 0.503013
thread: 0,0,0, value: 0.809163
thread: 1,0,0, value: 0.795669
thread: 0,1,0, value: 0.808565
thread: 1,1,0, value: 0.847564
thread: 0,0,1, value: 0.853998
thread: 1,0,1, value: 0.688446
thread: 0,1,1, value: 0.733255
thread: 1,1,1, value: 0.649379
thread: 0,0,0, value: 0.040824
thread: 1,0,0, value: 0.087417
thread: 0,1,0, value: 0.301392
thread: 1,1,0, value: 0.298669
thread: 0,0,1, value: 0.161962
thread: 1,0,1, value: 0.316443
thread: 0,1,1, value: 0.452077
thread: 1,1,1, value: 0.477722
========= ERROR SUMMARY: 0 errors
在本例中,我使用多次调用的相同内核来读取各个纹理对象。应该可以将多个对象传递给同一个内核,但是不建议使用单个对象warp从多个纹理读取(如果可以在代码中避免这种情况)。实际的问题在于四级,我不想深入讨论。最好可以安排代码,以便在任何给定周期上都从同一纹理对象读取扭曲。
请注意,为了简化演示,这CreateTexture()
函数会覆盖先前分配的设备指针,例如d_cuArr
,在循环处理期间。这不是非法的,也不是功能问题,但它增加了内存泄漏的可能性。
如果这是一个问题,我假设您可以修改代码来处理这些代码的释放。此代码的目的是演示使事情正常运行的方法。