Nvidia 性能基元 (NPP) http://developer.nvidia.com/cuda/nvidia-performance-primitives提供了nppiFilter
用于将用户提供的图像与用户提供的内核进行卷积的函数。对于一维卷积核,nppiFilter
工作正常。然而,nppiFilter
正在为 2D 内核生成垃圾图像。
I used the typical Lena image as input:
这是我的实验1D卷积核,产生良好的输出。
#include <npp.h> // provided in CUDA SDK
#include <ImagesCPU.h> // these image libraries are also in CUDA SDK
#include <ImagesNPP.h>
#include <ImageIO.h>
void test_nppiFilter()
{
npp::ImageCPU_8u_C1 oHostSrc;
npp::loadImage("Lena.pgm", oHostSrc);
npp::ImageNPP_8u_C1 oDeviceSrc(oHostSrc); // malloc and memcpy to GPU
NppiSize kernelSize = {3, 1}; // dimensions of convolution kernel (filter)
NppiSize oSizeROI = {oHostSrc.width() - kernelSize.width + 1, oHostSrc.height() - kernelSize.height + 1};
npp::ImageNPP_8u_C1 oDeviceDst(oSizeROI.width, oSizeROI.height); // allocate device image of appropriately reduced size
npp::ImageCPU_8u_C1 oHostDst(oDeviceDst.size());
NppiPoint oAnchor = {2, 1}; // found that oAnchor = {2,1} or {3,1} works for kernel [-1 0 1]
NppStatus eStatusNPP;
Npp32s hostKernel[3] = {-1, 0, 1}; // convolving with this should do edge detection
Npp32s* deviceKernel;
size_t deviceKernelPitch;
cudaMallocPitch((void**)&deviceKernel, &deviceKernelPitch, kernelSize.width*sizeof(Npp32s), kernelSize.height*sizeof(Npp32s));
cudaMemcpy2D(deviceKernel, deviceKernelPitch, hostKernel,
sizeof(Npp32s)*kernelSize.width, // sPitch
sizeof(Npp32s)*kernelSize.width, // width
kernelSize.height, // height
cudaMemcpyHostToDevice);
Npp32s divisor = 1; // no scaling
eStatusNPP = nppiFilter_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI, deviceKernel, kernelSize, oAnchor, divisor);
cout << "NppiFilter error status " << eStatusNPP << endl; // prints 0 (no errors)
oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch()); // memcpy to host
saveImage("Lena_filter_1d.pgm", oHostDst);
}
Output of the above code with kernel [-1 0 1]
-- it looks like a reasonable gradient image:
然而,nppiFilter
如果我使用,输出垃圾图像2D卷积核。以下是我为使用 2D 内核运行而对上述代码进行的更改[-1 0 1; -1 0 1; -1 0 1]
:
NppiSize kernelSize = {3, 3};
Npp32s hostKernel[9] = {-1, 0, 1, -1, 0, 1, -1, 0, 1};
NppiPoint oAnchor = {2, 2}; // note: using anchor {1,1} or {0,0} causes error -24 (NPP_TEXTURE_BIND_ERROR)
saveImage("Lena_filter_2d.pgm", oHostDst);
下面是使用 2D 内核的输出图像[-1 0 1; -1 0 1; -1 0 1]
.
我究竟做错了什么?
这篇 StackOverflow 帖子 https://stackoverflow.com/questions/12778463/cuda-npp-filters描述了类似的问题,如用户 Steensrup 的图片所示:http://1ordrup.dk/kasper/image/Lena_boxFilter5.jpg http://1ordrup.dk/kasper/image/Lena_boxFilter5.jpg
最后一些注意事项:
- 对于 2D 内核,对于某些锚点值(例如
NppiPoint oAnchor = {0, 0}
or {1, 1}
),我收到错误-24
,这意味着NPP_TEXTURE_BIND_ERROR
根据核电站用户指南 http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/NPP_Library.pdf。这个问题在这个 StackOverflow 帖子 https://stackoverflow.com/questions/12778463/cuda-npp-filters.
- 这段代码非常冗长。这不是主要问题,但是有人对如何使这段代码更加简洁有任何建议吗?