版权声明:本文为博主原创文章,未经博主允许不得转载。有问题可以加微信:lp9628(注明CSDN)。 https://cloud.tencent.com/developer/article/1435833
(1)查看CUDA版本:cat /usr/local/cuda/version.txt (目前实验CUDA版本为:CUDA Version 9.0.176)
(2)查看cudnn版本:cat /usr/local/cuda/include/cudnn.h | grep CUDNN\_MAJOR -A 2 (目前实验cudnn版本:7.0)
(3)安装直接:pip install pycuda==2017.1.1 (目前实验cudnn版本:2017.1.1)
注意:pycuda查看可以看这里:[pycuda](http://www.lfd.uci.edu/~gohlke/pythonlibs/?cm_mc_uid=08085305845514542921829&cm_mc_sid_50200000=1456395916#pycuda)官网。 (参看版本之间的对应)
hello_gpu.py
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(400,1,1), grid=(1,1))
print ( dest-a*b )
或者:
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
//const int i = threadIdx.x;
const int i = blockIdx.x * blockDim.x + threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(40,1,1), grid=(10,1))
print ( dest-a*b )
cpu与gpu计算效率的对比: test.py
import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from timeit import default_timer as timer
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void func(float *a, float *b, size_t N)
{
const int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= N)
{
return;
}
float temp_a = a[i];
float temp_b = b[i];
a[i] = (temp_a * 10 + 2 ) * ((temp_b + 2) * 10 - 5 ) * 5;
// a[i] = a[i] + b[i];
}
""")
func = mod.get_function("func")
def test(N):
# N = 1024 * 1024 * 90 # float: 4M = 1024 * 1024
print("N = %d" % N)
N = np.int32(N)
a = np.random.randn(N).astype(np.float32)
b = np.random.randn(N).astype(np.float32)
# copy a to aa
aa = np.empty_like(a)
aa[:] = a
# GPU run
nTheads = 256
nBlocks = int( ( N + nTheads - 1 ) / nTheads )
start = timer()
func(
drv.InOut(a), drv.In(b), N,
block=( nTheads, 1, 1 ), grid=( nBlocks, 1 ) )
run_time = timer() - start
print("gpu run time %f seconds " % run_time)
# cpu run
start = timer()
aa = (aa * 10 + 2 ) * ((b + 2) * 10 - 5 ) * 5
run_time = timer() - start
print("cpu run time %f seconds " % run_time)
# check result
r = a - aa
print( min(r), max(r) )
def main():
for n in range(1, 10):
N = 1024 * 1024 * (n * 10)
print("------------%d---------------" % n)
test(N)
if __name__ == '__main__':
main()
结果:
------------1---------------
N = 10485760
gpu run time 0.023215 seconds
cpu run time 0.068797 seconds
-0.0014648438 0.0014648438
------------2---------------
N = 20971520
gpu run time 0.032089 seconds
cpu run time 0.124529 seconds
-0.0014648438 0.0014648438
------------3---------------
N = 31457280
gpu run time 0.046203 seconds
cpu run time 0.187157 seconds
-0.0014648438 0.0014648438
------------4---------------
N = 41943040
gpu run time 0.055805 seconds
cpu run time 0.244947 seconds
-0.0014648438 0.0014648438
------------5---------------
N = 52428800
gpu run time 0.075256 seconds
cpu run time 0.317744 seconds
-0.0014648438 0.0014648438
------------6---------------
N = 62914560
gpu run time 0.080560 seconds
cpu run time 0.378609 seconds
-0.0014648438 0.0014648438
------------7---------------
N = 73400320
gpu run time 0.101881 seconds
cpu run time 0.439889 seconds
-0.0014648438 0.0014648438
------------8---------------
N = 83886080
gpu run time 0.112525 seconds
cpu run time 0.504098 seconds
-0.0014648438 0.0014648438
------------9---------------
N = 94371840
gpu run time 0.139425 seconds
cpu run time 0.576029 seconds
-0.0014648438 0.0014648438
(1)GPU共享内存:pycuda使用教程:https://blog.csdn.net/qq_36387683/article/details/81075870
(2)pycuda教程:https://documen.tician.de/pycuda/tutorial.html
(3)理论性指导可以看这篇:https://blog.csdn.net/hujingshuang/article/details/53097222
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有