重新在台式机上学习深度学习,Ubuntu18.04 + Tensorflow-gpu + cuda8.0 + cuDNN6.0
一、准备工作(有一些不是必需的):
1.安装pip:
打开终端输入命令:sudo apt-get install python-pip python-dev
2.安装vim:
sudo apt-get install vim-gtk
在命令行下,输入命令:sudo vim /etc/vim/vimrc 必须加上sudo,否则你是没有权限编辑vimrc的。
在这个文件中,会有这么一句: syntax on 意思是语法高亮,如果您的被注释掉了,请“让它出来”。
请在您的VIM的最后一行,输入他们,可以让您的VIM变得更漂亮、舒服。
set nu // 在左侧行号
set tabstop //tab 长度设置为 4
set nobackup //覆盖文件时不备份
set cursorline //突出显示当前行
set ruler //在右下角显示光标位置的状态行
set autoindent //自动缩进
然后 ESC :wq保存即可
二、ubuntu安装GPU显卡驱动
打开终端:sudo apt-get update
选择系统设置→软件更新→附加驱动→选择nvidia最新驱动→应用更改.(如果没有,去官网找对应的驱动安装即可)
验证安装成功:软件连找到 NVIDIA X Sever Settings
驱动安装成功!
三、安装Tensorflow依赖的编译工具bazel
1、安装bazel前,需先安装JDK8
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
安装完成后 验证java版本:java -version
2、安装bazel
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
sudo apt install curl
安装和更新Bazel
sudo apt-get update && sudo apt-get install bazel
sudo apt-get upgrade bazel
三、安装cuda 8.0(先看本章 4、验证cuda8是否安装成功)
1、cuda 8.0 下载地址:https://developer.nvidia.com/cuda-80-ga2-download-archive
验证主机是否满足cuda8.0安装的要求
lspci | grep -i nvidia
uname -m && /etc/*release
gcc --version
uname -r
sudoapt-get install linux-headers-$(uname -r)
sudo sh cuda_8.0.27_linux.run
2、添加环境变量
cd ~
vim .bashrc
末尾添加
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/ CUPTI/lib64:$LD_LIBRARY_PATH
然后 source ~/.bashrc
3、nvcc -V
如果没有nvcc则安装 sudo apt install nvidia-cuda-toolkit
nvcc -V
nvidia-smi
4、验证cuda8是否安装成功:
进入NVIDIA_CUDA-8.0_Samples目录,执行:$ make
出现错误!!!
原因:由于cuda8.0不支持gcc 5.0以上的编译器,因此需要降级,把编译器版本降到4.8
5、g++/gcc降级到4.8(建议降级到4.9,我这里到4.9失败所以用4.8)
sudo apt-get -y update
sudo apt-get install -y gcc-4.9
sudo apt-get install -y g++-4.9
cd /usr/bin
sudo rm gcc
sudo ln -s gcc-4.8 gcc
sudo rm g++
sudo ln -s g++-4.8 g++
6、CUDA8.0安装成功测试
进入NVIDIA_CUDA-8.0_Samples目录,执行:$ sudo make
进入1_Utilities目录,执行:$ ./deviceQuery/deviceQuery ,结果如下:
执行:$ nvcc -V,结果如下:
验证CUDA8.0安装成功,demo也跑通了。
四、安装cuDNN6.0(因为5.1在编译步骤不通过)
下载地址6.0: https://developer.nvidia.com/rdp/cudnn-archive
打开文件所在文件夹,当前位置打开终端
tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz
sudo cp -P cuda/include/cudnn.h /usr/local/cuda-8.0/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64
sudo chmod a+r /usr/local/cuda-8.0/include/cudnn.h /usr/local/cuda-8.0/lib64/libcudnn*
配置环境变量;
sudo gedit ~/.bashrc
添加:
export LD_LIBRARY_PATH=”$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64”
export CUDA_HOME=/usr/local/cuda
export PATH="$CUDA_HOME/bin:$PATH"
再执行:source ~/.bashrc
配置完成!
五、Tensorflow(GPU)安装
方法一:pip 二进制文件安装
pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp27-none-linux_x86_64.whl
方法二:原文件安装
1、安装TF依赖工具包
sudo apt-get install python-numpy swigpython-dev python-wheel
2、下载最新源码
sudo apt-get install git
git clone https://github.com/tensorflow/tensorflow
3、运行configure脚本配置环境信息
cd tensorflow
./configure
并不是默认选择Y就可以了,巨大部分是选择N,具体他有些选项不会出现,需要另外查询是不是需要选择y/n,计提问题具体分析,下面是我的选项。
twinkle@twinkle:~/tensorflow$ ./configure
WARNING: Running Bazel server needs to be killed, because the startup options are different.
You have bazel 0.14.1 installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
/usr/bin/python
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 8
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 6
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
4、通过bazel来编译pip的安装包,然后通过pip安装
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
如果出现错误:
Cuda Configuration Error: Cannot find libdevice.10.bc under /usr/local/cuda-8.0
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_cuda//crosstool': Traceback (most recent call last):
解决方案:将/usr/local/cuda-8.0/nvvm/libdevice/libdevice.compute_50.10.bc改为libdevice.10.bc,并复制一份至/usr/local/cuda-8.0/
再执行上面的命令,上面的过程比较耗时,结束后继续执行。
bazel-bin/tensorflow/tools/pip_package/build_pip_package/tmp/tensorflow_pkg
sudo pip install/tmp/tensorflow_pkg/tensorflow-1.2.0rc2-cp27-cp27mu-linux_x86_64.whl
第一个命令中 --config=cuda参数为对GPU的支持,如何不需要支持GPU,就不需要这个参数。
5\测试:
import tensorflow as tf
a=tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b=tf.constant([1.0,2.0,3.0],shape=[3],name='b')
c=a+b
sess=tf.Session(config=tf.ConfigProto(log_device_placement=True))
print sess.run(c)