前提:A100, cuda 11.6 cudnn8 nccl zlib1g-dev
git clone --recursive https://github.com/NVIDIA/FasterTransformer.git
git submodule update --recursive --init
#保险起见
cd FasterTransformer
mkdir build
cd build
cmake ..
make -j
执行示例:
BF16数据格式:
错误处理:
如遇到如下错误,可如下方法解决:
fatal error: nccl.h: No such file or directory
fatal error: zlib.h: No such file or direct
安装 nccl gpu 通信包
GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication
如下都是官方步骤:
下载源代码:
git clone --recursive https://github.com/NVIDIA/nccl.git
编译源代码:
$ cd nccl
$ make -j src.build
deb包:
$ # Install tools to create debian packages
$ sudo apt install build-essential devscripts debhelper fakeroot
$ # Build NCCL deb package
$ make pkg.debian.build
$ ls build/pkg/deb/
tar包:
$ make pkg.txz.build
$ ls build/pkg/txz/
安装:
sudo dpkg -i libnccl-dev_2.13.4-1+cuda11.6_amd64.deb
sudo dpkg -i libnccl2_2.13.4-1+cuda11.6_amd64.deb
测试是否安装成功:
$ git clone https://github.com/NVIDIA/nccl-tests.git
$ cd nccl-tests
$ make
$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus>
顺带安装一下zlib:
$ sudo apt-get install zlib1g-dev