错误信息:
AssertionError:
The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
出错位置:
File "/users4/zsun/pytorch/OpenNMT-py/onmt/train_single.py", line 38, in configure_process
torch.cuda.set_device(device_id)
File "/users4/zsun/anaconda3/envs/onmt_summary100W/lib/python3.5/site-packages/torch/cuda/__init__.py", line 300, in set_device
torch._C._cuda_setDevice(device)
File "/users4/zsun/anaconda3/envs/onmt_summary100W/lib/python3.5/site-packages/torch/cuda/__init__.py", line 192, in _lazy_init
_check_driver()
File "/users4/zsun/anaconda3/envs/onmt_summary100W/lib/python3.5/site-packages/torch/cuda/__init__.py", line 111, in _check_driver
of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
由信息可以看出,要么升级NVIDIA driver,要么重新安装本机CUDA driver能够匹配的pytorh版本,说明我的机器上pytorch版本过新,而cuda版本太老,两者无法匹配。cuda版本是服务器的我们无法更改,
查看一下本机cuda
参考链接:https://blog.csdn.net/leviopku/article/details/84851244
(onmt_summary100W) [zsun@gpu09 OpenNMT-py]$ cat /usr/local/cuda/version.txt
CUDA Version 10.0.130
(onmt_summary100W) [zsun@gpu09 OpenNMT-py]$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 3
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
查看一下pytroch的版本(也可以使用pip list查看)
>>> torch.__version__
'1.3.1'
这两者经常同时出现,所以顺便查一下
>>> torchvision.__version__
'0.4.2'
根据报错信息提示,查看pytorch官网 https://pytorch.org/get-started/previous-versions/ 看到这样一处
# CUDA 10.0
pip install torch==1.2.0 torchvision==0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
发现可能是torch1.3对于cuda 10.0 来说版本过新,所以按照上述命令尝试重新安装一下torch==1.2.0
重新运行自己的程序
终于
错误消失!!!