问题来源
近日,使用paddle官方的conda安装命令安装最新版的paddle
conda install paddlepaddle-gpu==2.4.1 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
命令安装了paddle的GPU版需要的cudatoolkit=11.7和cudnn8.6和一些依赖包。
但是,在python交互界面运行paddle.utils.run_check()却出现了错误
Running verify PaddlePaddle program ...
W0114 22:09:11.388418 103110 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.7
W0114 22:09:11.388692 103110 dynamic_loader.cc:307] The third-party dynamic library (libcudnn.so) that Paddle depends on is not configured correctly. (error code is /usr/local/cuda/lib64/libcudnn.so: cannot open shared object file: No such file or directory)
Suggestions:
1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
2. Configure third-party dynamic library environment variables as follows:
- Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
- Windows: set PATH by `set PATH=XXX;
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/utils/install_check.py", line 269, in run_check
_run_static_single(use_cuda, use_xpu, use_npu)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/utils/install_check.py", line 173, in _run_static_single
exe.run(startup_prog)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/executor.py", line 1463, in run
six.reraise(*sys.exc_info())
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/six.py", line 719, in reraise
raise value
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/executor.py", line 1450, in run
res = self._run_impl(program=program,
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/executor.py", line 1661, in _run_impl
return new_exe.run(scope, list(feed.keys()), fetch_list,
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/executor.py", line 631, in run
tensors = self._new_exe.run(scope, feed_names,
RuntimeError: In user code:
File "<stdin>", line 1, in <module>
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/utils/install_check.py", line 269, in run_check
_run_static_single(use_cuda, use_xpu, use_npu)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/utils/install_check.py", line 159, in _run_static_single
input, out, weight = _simple_network()
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/utils/install_check.py", line 33, in _simple_network
weight = paddle.create_parameter(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/layers/tensor.py", line 152, in create_parameter
return helper.create_parameter(attr, shape, convert_dtype(dtype), is_bias,
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/layer_helper_base.py", line 381, in create_parameter
self.startup_program.global_block().create_parameter(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/framework.py", line 3965, in create_parameter
initializer(param, self)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/initializer.py", line 56, in __call__
return self.forward(param, block)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/initializer.py", line 184, in forward
op = block.append_op(type="fill_constant",
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/framework.py", line 4017, in append_op
op = Operator(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/paddle/fluid/framework.py", line 2858, in __init__
for frame in traceback.extract_stack():
PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
[Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)
[operator < fill_constant > error]
,所以令人疑惑的是我明明在虚拟环境中安装了cudnn,但是paddle却不能找到cudnn库的位置。
查找问题
conda安装cuda和cudnn后文件保存在conda的虚拟环境的lib
和include
目录下,运行命令
cd ~/miniconda3/envs/py310
ls lib/ | grep cudnn
ls include/ | grep cudnn
查找到了cudnn的运行库和头文件,可见cudnn确实安装成功了。
猜想
猜测是环境变量没有弄好,运行
echo $PATH
echo $LD_LIBRARY_PATH
发现果然没有包含conda安装的cudnn的动态链接库和头文件的路径
值得注意的是/usr/local/cuda/lib/
等路径是直接安装英伟达官网提供cuda安装包的安装路径,后面觉得太占磁盘空间了(11G),就卸载了。想使用conda安装cudatoolkit和cudnn。
解决办法
将conda虚拟环境的lib
和include
路径添加到conda虚拟环境的环境变量中。使用命令
conda env config vars set LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/root/miniconda3/envs/py310/include:/root/miniconda3/envs/py310/lib
这里的conda env config vars set 会将系统的$LD_LIBRARY_PATH
环境变量覆写,且只影响当前的py310虚拟环境。
然后,退出当前虚拟环境并重新进入
conda deactivate
conda activate py310
再次运行
import paddle
paddle.utils.run_check()
可以看到paddle识别出了cudnn,可以正常使用了