cub项目github网址:
GitHub - NVIDIA/cub: Cooperative primitives for CUDA C++.
1. 构建 thrust
# Clone Thrust and CUB from Github. CUB is located in Thrust's
# `dependencies/cub` submodule.
git clone --recursive https://github.com/NVIDIA/thrust.git
cd thrust
# Create build directory:
mkdir build
cd build
# Configure -- use one of the following:
cmake -DTHRUST_INCLUDE_CUB_CMAKE=ON .. # Command line interface.
# Build:
#cmake --build . -j <num jobs> # invokes make (or ninja, etc)
make -j16
# Run tests and examples:
ctest
2. 构建 cub
(1.)注意:
不要单独git clone ...cub.git,而应该git clone ...thrust.git 后,在dependencies/ 目录中构建cub,否则,构建系统将会去包含 系统的thrust头文件(跟随 cuda Toolkits 一起安装的thrust,路径 $ ls /usr/local/cuda/targets/x86_64-linux/include/ 即会显示 thrust 和 cub 文件夹)
(2.)构建方法
$ thrust$ cd dependencies/
$ ls
#显示的内容跟cub github上的一样,其中包含一个 CMakeLists.txt
$ mkdir build
$ cd build
$ cmake ..
$ make -j16
3.构建过程中可能遇到的问题
如果使用的是较新版本的cuda,那么,无论是编译 cub还是编译 thrust 的example 和 test 等,在执行 cmake 生成Makefile时,都有可能遇到如下2个错误:
问题1. CUDA architecture
Failed to detect a default CUDA architecture.
详细为:
CMake Error at /usr/local/share/cmake-3.23/Modules/CMakeDetermineCUDACompiler.cmake:601 (message):
Failed to detect a default CUDA architecture.
Compiler output:
Call Stack (most recent call first):
cmake/CubCudaConfig.cmake:1 (enable_language)
CMakeLists.txt:83 (include)
问题2. CUDA compiler
-- The CUDA compiler identification is unknown
CMake Error at cmake/CubCudaConfig.cmake:1 (enable_language):
No CMAKE_CUDA_COMPILER could be found.
Tell CMake where to find the compiler by setting either the environment
variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
path to the compiler, or to the compiler name if it is in the PATH.
Call Stack (most recent call first):
CMakeLists.txt:91 (include)
解决:
因为本机是Ampere 体系架构 2080ti,故为compute catibility = 7.5,那么
在 thrust 或 cub 的顶级 CMakeLists.txt 中,在适当的行处加入如下内容:
set(CMAKE_CUDA_ARCHITECTURES 75)
set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc")
问题3. 链接失败 ld命令失败
错误提示类似:
collect2: fatal error: ld terminated with signal 9
这是执行 make -j16 时有可能遇到的链接错误,即,ld 环节报错,这是由于同时链接多个目标,使得内存不足,交换分区不足导致的:
解决:
1. 首先尝试放弃多线程编译, 将 make -j16 命令改为 make
2. 尝试增大交换分区的空间,类似如下这样
sudo mkdir swapfile
cd /swapfile
sudo dd if=/dev/zero of=swap bs=1024 count=20000000
sudo mkswap -f swap
sudo swapon swap