目錄
- 前言
- 安裝顯卡驅(qū)動
- 卸載CUDA
- 安裝CUDA
- 測試安裝是否成功
- 參考資料
前言
最近在學(xué)習(xí)PaddlePaddle在各個顯卡驅(qū)動版本的安裝和使用,所以同時也學(xué)習(xí)如何在Ubuntu安裝和卸載CUDA和CUDNN,在學(xué)習(xí)過程中,順便記錄學(xué)習(xí)過程。在供大家學(xué)習(xí)的同時,也在加強(qiáng)自己的記憶。本文章以卸載CUDA 8.0 和 CUDNN 7.05 為例,以安裝CUDA 10.0 和 CUDNN 7.4.2 為例。
安裝顯卡驅(qū)動
禁用nouveau驅(qū)動
sudo vim /etc/modprobe.d/blacklist.conf
在文本最后添加:
blacklist nouveau
options nouveau modeset=0
然后執(zhí)行:
重啟后,執(zhí)行以下命令,如果沒有屏幕輸出,說明禁用nouveau成功:
下載驅(qū)動
官網(wǎng)下載地址:https://www.nvidia.cn/Download/index.aspx?lang=cn ,根據(jù)自己顯卡的情況下載對應(yīng)版本的顯卡驅(qū)動,比如筆者的顯卡是RTX2070:
下載完成之后會得到一個安裝包,不同版本文件名可能不一樣:
NVIDIA-Linux-x86_64-410.93.run
卸載舊驅(qū)動
以下操作都需要在命令界面操作,執(zhí)行以下快捷鍵進(jìn)入命令界面,并登錄:
執(zhí)行以下命令禁用X-Window服務(wù),否則無法安裝顯卡驅(qū)動:
sudo service lightdm stop
執(zhí)行以下三條命令卸載原有顯卡驅(qū)動:
sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-410.93.run
sudo ./NVIDIA-Linux-x86_64-410.93.run --uninstall
安裝新驅(qū)動
直接執(zhí)行驅(qū)動文件即可安裝新驅(qū)動,一直默認(rèn)即可:
sudo ./NVIDIA-Linux-x86_64-410.93.run
執(zhí)行以下命令啟動X-Window服務(wù)
sudo service lightdm start
最后執(zhí)行重啟命令,重啟系統(tǒng)即可:
注意: 如果系統(tǒng)重啟之后出現(xiàn)重復(fù)登錄的情況,多數(shù)情況下都是安裝了錯誤版本的顯卡驅(qū)動。需要下載對應(yīng)本身機(jī)器安裝的顯卡版本。
卸載CUDA
為什么一開始我就要卸載CUDA呢,這是因?yàn)楣P者是換了顯卡RTX2070,原本就安裝了CUDA 8.0 和 CUDNN 7.0.5不能夠正常使用,筆者需要安裝CUDA 10.0 和 CUDNN 7.4.2,所以要先卸載原來的CUDA。注意以下的命令都是在root用戶下操作的。
卸載CUDA很簡單,一條命令就可以了,主要執(zhí)行的是CUDA自帶的卸載腳本,讀者要根據(jù)自己的cuda版本找到卸載腳本:
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
卸載之后,還有一些殘留的文件夾,之前安裝的是CUDA 8.0??梢砸徊h除:
sudo rm -rf /usr/local/cuda-8.0/
這樣就算卸載完了CUDA。
安裝CUDA
安裝的CUDA和CUDNN版本:
接下來的安裝步驟都是在root用戶下操作的。
下載和安裝CUDA
我們可以在官網(wǎng):CUDA10下載頁面,
下載符合自己系統(tǒng)版本的CUDA。頁面如下:
下載完成之后,給文件賦予執(zhí)行權(quán)限:
chmod +x cuda_10.0.130_410.48_linux.run
執(zhí)行安裝包,開始安裝:
./cuda_10.0.130_410.48_linux.run
開始安裝之后,需要閱讀說明,可以使用Ctrl + C
直接閱讀完成,或者使用空格鍵
慢慢閱讀。然后進(jìn)行配置,我這里說明一下:
(是否同意條款,必須同意才能繼續(xù)安裝)
accept/decline/quit: accept
(這里不要安裝驅(qū)動,因?yàn)橐呀?jīng)安裝最新的驅(qū)動了,否則可能會安裝舊版本的顯卡驅(qū)動,導(dǎo)致重復(fù)登錄的情況)
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n
Install the CUDA 10.0 Toolkit?(是否安裝CUDA 10 ,這里必須要安裝)
(y)es/(n)o/(q)uit: y
Enter Toolkit Location(安裝路徑,使用默認(rèn),直接回車就行)
[ default is /usr/local/cuda-10.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?(同意創(chuàng)建軟鏈接)
(y)es/(n)o/(q)uit: y
Install the CUDA 10.0 Samples?(不用安裝測試,本身就有了)
(y)es/(n)o/(q)uit: n
Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...(開始安裝)
安裝完成之后,可以配置他們的環(huán)境變量,在vim ~/.bashrc
的最后加上以下配置信息:
export CUDA_HOME=/usr/local/cuda-10.0
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}
最后使用命令source ~/.bashrc
使它生效。
可以使用命令nvcc -V
查看安裝的版本信息:
test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
測試安裝是否成功
執(zhí)行以下幾條命令:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
make
./deviceQuery
正常情況下輸出:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2070"
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7950 MBytes (8335982592 bytes)
(36) Multiprocessors, ( 64) CUDA Cores/MP: 2304 CUDA Cores
GPU Max Clock rate: 1620 MHz (1.62 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS
下載和安裝CUDNN
進(jìn)入到CUDNN的下載官網(wǎng):https://developer.nvidia.com/rdp/cudnn-download ,然點(diǎn)擊Download開始選擇下載版本,當(dāng)然在下載之前還有登錄,選擇版本界面如下,我們選擇cuDNN Library for Linux
:
下載之后是一個壓縮包,如下:
cudnn-10.0-linux-x64-v7.4.2.24.tgz
然后對它進(jìn)行解壓,命令如下:
tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
解壓之后可以得到以下文件:
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.4.2
cuda/lib64/libcudnn_static.a
使用以下兩條命令復(fù)制這些文件到CUDA目錄下:
cp cuda/lib64/* /usr/local/cuda-10.0/lib64/
cp cuda/include/* /usr/local/cuda-10.0/include/
拷貝完成之后,可以使用以下命令查看CUDNN的版本信息:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
測試安裝結(jié)果
到這里就已經(jīng)完成了CUDA 10 和 CUDNN 7.4.2 的安裝。可以安裝對應(yīng)的Pytorch的GPU版本測試是否可以正常使用了。安裝如下:
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.0-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision
然后使用以下的程序測試安裝情況:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def main():
cudnn.benchmark = True
torch.manual_seed(1)
device = torch.device("cuda")
kwargs = {'num_workers': 1, 'pin_memory': True}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=True, **kwargs)
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(1, 11):
train(model, device, train_loader, optimizer, epoch)
if __name__ == '__main__':
main()
如果正常輸出一下以下信息,證明已經(jīng)安裝成了:
Train Epoch: 1 [0/60000 (0%)] Loss: 2.365850
Train Epoch: 1 [640/60000 (1%)] Loss: 2.305295
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.301407
Train Epoch: 1 [1920/60000 (3%)] Loss: 2.316538
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.255809
Train Epoch: 1 [3200/60000 (5%)] Loss: 2.224511
Train Epoch: 1 [3840/60000 (6%)] Loss: 2.216569
Train Epoch: 1 [4480/60000 (7%)] Loss: 2.181396
參考資料
https://developer.nvidia.com
https://www.cnblogs.com/luofeel/p/8654964.html
到此這篇關(guān)于Ubuntu安裝和卸載CUDA和CUDNN的實(shí)現(xiàn)的文章就介紹到這了,更多相關(guān)Ubuntu安裝和卸載CUDA和CUDNN內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!