文章原文指路:https://juejin.cn/post/7479993915041660968
正文如下:
cuDNN是什么?为什么要安装cuDNN?本文将介绍nvidia硬件和驱动(包含nvidia driver),cuda工具包(cuda toolkit),cuDNN系列库和TensorRT,讲解不同层次硬件和驱动以及软件的关系和作用.并使用腾讯cloud stuio做示例,并安装和配置pytorch的GPU加速.
Cloud Studio(云端 IDE)是基于浏览器的集成式开发环境,为开发者提供了一个稳定的云端工作站。支持CPU与GPU的访问。用户在使用 Cloud Studio 时无需安装,随时随地打开浏览器即可使用。 Cloud Studio支持免费的CPU环境(每月5w mins)和免费的GPU环境(一张Tesla T4 16G)(每月1w mins).本文将用Cloud Studio的GPU环境演示说明.
空间模版
-> AI模版
-> Pytorch2.0.0
免费基础版
-> 确认
高性能工作空间
. Pytorch2.0.0 gssrak
这个就是已经创建的GPU空间了.可以看到这里已经有绿色圆点
,并显示运行中
. Pytorch2.0.0 gssrak
进入空间,等待不到一分钟则会加载完成 Nvidia Driver是专为nvidia GPU的驱动程序.有了Nvidia Drvier,才可以正确驱动GPU,从而正常输出显示画面(针对studio专业显卡或者游戏显卡)和加速科学计算(针对数据中心显卡等).它也是之后安装CUDA toolkit或者cuDNN的基础.
nvidia-smi
查看
(base) root@VM-24-95-ubuntu:/workspace# nvidia-smi
Mon Mar 10 12:13:25 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:09.0 Off | 0 |
| N/A 31C P8 10W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Driver Version: 525.105.17
指Nvidia Driver版本是525.105.17
CUDA Version: 12.0
指目前的Nvidia Driver版本所能支持的 最高 CUDA版本是12.0
CUDA12.0
以及 <= CUDA12.0
的其他版本(CUDA11.8, CUDA11.7, CUDA10.0 等).另一方面 CUDA12.1
, CUDA12.8
等高于 CUDA12.0
的版本,则不被支持.CUDA Toolkit
是 NVIDIA 提供的一套完整的开发工具集,用于开发和优化 CUDA 程序.它包括编译器(如 nvcc
)、调试器、运行时库(cudart)、性能分析工具以及各种数学和计算库. 注意如果只需要运行tensotflow或pytorch其实不需要安装(完全版) CUDA toolkit
,在安装pytorch或者tensorflow时候自带的cuDNN
的子集既可实现GPU加速计算.近在需要开发CUDA算子,编译GPU加速实现(如Apex库)等情况下需要安装CUDA toolkit
Cloud Studio已经默认安装配置了CUDA toolkit
版本11.7
(base) root@VM-24-95-ubuntu:/workspace# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
echo $PATH
,检查是否包含过了路径/usr/local/cuda/bin
(base) root@VM-24-95-ubuntu:/workspace# echo $PATH
/etc/.hai/cloud_studio/vendor/modules/code-oss-dev/bin/remote-cli:/root/miniforge3/bin:/root/miniforge3/condabin:/root/miniforge3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
echo $LD_LIBRARY_PATH
,检查是否包含过了路径/usr/local/cuda/lib64
(base) root@VM-24-95-ubuntu:/workspace# echo $LD_LIBRARY_PATH
/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
NVIDIA CUDA 深度神经网络库(cuDNN) 是一个 GPU 加速的深度神经网络基本操作库。它提供了深度神经网络(DNN)应用中频繁出现的运算的优化实现.cuDNN是实际在tensorflow,pytorch或大模型部署平台的GPU加速的实现.
cuDNN
:docs.nvidia.com/deeplearnin…此时如果按照如上所述使用Pytorch2.0.0
空间模版则不需要另外再安装cuDNN
.因为此时Cloud Studio已经安装并配置好了GPU版本的pytorch,也就是说需要的cuDNN
的子集.
python -c "import torch;print(torch.cuda.is_available())"
cuDNN
是否启用python -c "import torch;print(torch.backends.cudnn.enabled)"
cuDNN
版本python -c "import torch;print(torch.backends.cudnn.version())"
(base) root@VM-24-95-ubuntu:/workspace# python -c "import torch;print(torch.cuda.is_available())"
True
(base) root@VM-24-95-ubuntu:/workspace# python -c "import torch;print(torch.backends.cudnn.enabled)"
True
(base) root@VM-24-95-ubuntu:/workspace# python -c "import torch;print(torch.backends.cudnn.version())"
8500
cuDNN
的子集,使用代码查看so库find $(python -c "import torch; print(torch.__path__[0])") -name "*cudnn*so*"
(base) root@VM-24-95-ubuntu:/workspace# find $(python -c "import torch; print(torch.__path__[0])") -name "*cudnn*so*"
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn.so.8
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn_adv_infer.so.8
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn_cnn_train.so.8
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn_adv_train.so.8
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn_ops_train.so.8
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn_cnn_infer.so.8
/root/miniforge3/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
cuDNN
安装apt -y install libcudnn8-samples libfreeimage-dev build-essential
由于刚刚看Cloud Studio的pytorch自带的cuDNN
是8500
版本所以此处安装libcudnn8-samples
.cd /usr/src/cudnn_samples_v8/mnistCUDNN && make clean && make
./mnistCUDNN
出现Test passed!
则为安装cuDNN
成功.logs of `./mnistCUDNN`
bash 代码解读复制代码(base) root@VM-24-95-ubuntu:/usr/src/cudnn_samples_v8/mnistCUDNN# make clean && make
rm -rf *o
rm -rf mnistCUDNN
CUDA_VERSION is 11070
Linking agains cublasLt = true
CUDA VERSION: 11070
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 35 50 53 60 61 62 70 72 75 80 86 87
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o fp16_dev.o -c fp16_dev.cu
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcublasLt -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
(base) root@VM-24-95-ubuntu:/usr/src/cudnn_samples_v8/mnistCUDNN# ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8500 , CUDNN_VERSION from cudnn.h : 8500 (8.5.0)
Host compiler version : GCC 9.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 40 Capabilities 7.5, SmClock 1590.0 Mhz, MemSize (Mb) 14928, MemClock 5001.0 Mhz, Ecc=1, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.027136 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.027680 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.059392 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.095232 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.149504 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 5.357568 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.088064 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.088352 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.129024 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.135936 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.144864 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 5.752384 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.025984 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.030496 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.061536 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.085920 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.086048 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.118688 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.080128 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.086432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.087552 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.124960 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.135456 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.143360 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.028000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.030048 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.080224 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.086048 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.093568 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 2.026400 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 51584 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.104480 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.121888 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.129344 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.133152 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.200096 time requiring 51584 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.919584 time requiring 64000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.032352 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.036704 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.037408 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.079872 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.083968 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.085984 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 51584 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.083360 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.120096 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.124992 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.127648 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.193344 time requiring 51584 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.282880 time requiring 64000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
由于Cloud Studio的AI模版大多是AI框架的cuDNN实现,且Cloud Studio空间自带conda,所以建议使用pip install
的方式安装.
pip install nvidia-cudnn-cu11
pip install nvidia-cudnn-cu11==9.x.y.z
wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.8.0.87_cuda11-archive.tar.xz
tar -xf cudnn-linux-x86_64-9.8.0.87_cuda11-archive.tar.xz --strip-components=1 -C /usr/local/cuda
conda install cudnn cuda-version=<cuda-major-version> -c nvidia
tensorRT是一个推理加速库,可以大幅加速生产环境的模型推理效果
pip install tensorrt-cu11
python -c "import tensorrt;print(tensorrt.__version__);assert tensorrt.Builder(tensorrt.Logger())"
pip install tensorrt-cu11
命令默认安装tensortrt cu11 version10 pip install tensorrt
命令则会安装tensortrt cu12 version10pip install tensorrt-cu11==10.0.1
或pip install tensorrt==8.5.3.1
logs of `pip install tensorrt-cu11`
bash 代码解读复制代码(base) root@VM-24-95-ubuntu:/workspace# pip install tensorrt-cu11
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting tensorrt-cu11
Downloading http://mirrors.tencentyun.com/pypi/packages/ad/04/0d6cffca481309ca0f6904446b4a075ddbf759f249851b54938c43fa6982/tensorrt_cu11-10.9.0.34.tar.gz (18 kB)
Preparing metadata (setup.py) ... done
Collecting tensorrt_cu11_libs==10.9.0.34 (from tensorrt-cu11)
Downloading http://mirrors.tencentyun.com/pypi/packages/12/3f/8962914e14e265711f262ad961b437630acacbe794f730f1b6503fe1cec8/tensorrt_cu11_libs-10.9.0.34.tar.gz (704 bytes)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting tensorrt_cu11_bindings==10.9.0.34 (from tensorrt-cu11)
Downloading http://mirrors.tencentyun.com/pypi/packages/6e/3c/056876197cf050b064fbc4a89a5f72e092ecf7a4f1454f0ca7c579fbc109/tensorrt_cu11_bindings-10.9.0.34-cp310-none-manylinux_2_28_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 28.1 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11 (from tensorrt_cu11_libs==10.9.0.34->tensorrt-cu11)
Downloading http://mirrors.tencentyun.com/pypi/packages/a6/ec/a540f28b31de7bc1ed49eecc72035d4cb77db88ead1d42f7bfa5ae407ac6/nvidia_cuda_runtime_cu11-11.8.89-py3-none-manylinux2014_x86_64.whl (875 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 875.6/875.6 kB 24.6 MB/s eta 0:00:00
Building wheels for collected packages: tensorrt-cu11, tensorrt_cu11_libs
Building wheel for tensorrt-cu11 (setup.py) ... done
Created wheel for tensorrt-cu11: filename=tensorrt_cu11-10.9.0.34-py2.py3-none-any.whl size=17466 sha256=48b8117c9b58cef409a1838af20124df8e830c0f91ccb256ce68a34ccb8cbab7
Stored in directory: /root/.cache/pip/wheels/74/2a/8a/58fb3d73239359b35886927883f9ede3f874dfe000f4847afd
Building wheel for tensorrt_cu11_libs (pyproject.toml) ... done
Created wheel for tensorrt_cu11_libs: filename=tensorrt_cu11_libs-10.9.0.34-py2.py3-none-manylinux_2_28_x86_64.whl size=2053243630 sha256=bf85dc722a08f2b28bc206a147737f74c62bf24f93842ea0ab5b6b4094cb0af7
Stored in directory: /root/.cache/pip/wheels/50/fe/b9/a6137a71b76c0282920b71420d97a280aa7388573cbee6ec28
Successfully built tensorrt-cu11 tensorrt_cu11_libs
Installing collected packages: tensorrt_cu11_bindings, nvidia-cuda-runtime-cu11, tensorrt_cu11_libs, tensorrt-cu11
Successfully installed nvidia-cuda-runtime-cu11-11.8.89 tensorrt-cu11-10.9.0.34 tensorrt_cu11_bindings-10.9.0.34 tensorrt_cu11_libs-10.9.0.34
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
(base) root@VM-24-95-ubuntu:/workspace# python -c "import tensorrt;print(tensorrt.__version__);assert tensorrt.Builder(tensorrt.Logger())"
10.9.0.34
[03/11/2025-01:49:50] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
logs of `pip install tensorrt==8.5.3.1`
(base) root@VM-24-95-ubuntu:/workspace# pip install tensorrt==8.5.3.1
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting tensorrt==8.5.3.1
Downloading http://mirrors.tencentyun.com/pypi/packages/3e/d5/5f9dd454a89f5bf09c3740c649ba6c8dd685cae98a1255299a2e1dbac606/tensorrt-8.5.3.1-cp310-none-manylinux_2_17_x86_64.whl (549.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.5/549.5 MB 47.7 MB/s eta 0:00:00
Requirement already satisfied: nvidia-cuda-runtime-cu11 in /root/miniforge3/lib/python3.10/site-packages (from tensorrt==8.5.3.1) (11.8.89)
Collecting nvidia-cudnn-cu11 (from tensorrt==8.5.3.1)
Downloading http://mirrors.tencentyun.com/pypi/packages/22/32/6385ef0da5e01553e3b8ad55428fd4824cbff29ff941185082b17f030c9e/nvidia_cudnn_cu11-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl (434.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 434.5/434.5 MB 72.8 MB/s eta 0:00:00
Collecting nvidia-cublas-cu11 (from tensorrt==8.5.3.1)
Downloading http://mirrors.tencentyun.com/pypi/packages/ea/2e/9d99c60771d275ecf6c914a612e9a577f740a615bc826bec132368e1d3ae/nvidia_cublas_cu11-11.11.3.6-py3-none-manylinux2014_x86_64.whl (417.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 417.9/417.9 MB 63.4 MB/s eta 0:00:00
Installing collected packages: nvidia-cublas-cu11, nvidia-cudnn-cu11, tensorrt
Successfully installed nvidia-cublas-cu11-11.11.3.6 nvidia-cudnn-cu11-9.8.0.87 tensorrt-8.5.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
(base) root@VM-24-95-ubuntu:/workspace# python -c "import tensorrt;print(tensorrt.__version__);assert tensorrt.Builder(tensorrt.Logger())"
8.5.3.1
[03/11/2025-02:03:52] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
本文系转载,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文系转载,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。