当前位置: 首页 > news >正文

numba 入门示例

一维向量求和:  C = A + B

在有nv 近几年gpu的ubuntu 机器上,

环境预备:

conda create -name numba_cuda_python3.10 python=3.10
conda activate numba_cuda_python3.10conda install numba
conda install cudatoolkit
conda install -c nvidia cuda-pythonor   $ conda install nvidia::cuda-python

示例1:源代码

C[i] = A[i] + B[i]

hello_numba_cpu_01.py

import time
import numpy as np
from numba import jit
from numba import njitdef f_py(a, b, c, N):for i in range(N):c[i] = a[i] + b[i]@jit
def f_bin(a, b, c, N):for i in range(N):c[i] = a[i] + b[i]@njit
def f_pure_bin(a, b, c, N):for i in range(N):c[i] = a[i] + b[i]if __name__ == "__main__":np.random.seed(1234)N = 1024*1024*128a_h = np.random.random(N)b_h = np.random.random(N)c_h1 = np.random.random(N)c_h2 = np.random.random(N)c_h3 = np.random.random(N)f_bin(a_h, b_h, c_h1, N)print('a_h  =', a_h)print('b_h  =', b_h)print('c_h1 =', c_h1)#c_h = np.random.random(N)#print('c_h =', c_h)f_pure_bin(a_h, b_h, c_h2, N)print('c_h2 =', c_h2)s1 = time.time()f_py(a_h, b_h, c_h1, N)e1 = time.time()print('time   py:',e1 - s1)s1 = time.time()f_bin(a_h, b_h, c_h2, N)e1 = time.time()print('time  jit:',e1 - s1)s1 = time.time()f_pure_bin(a_h, b_h, c_h3, N)e1 = time.time()print('time njit:',e1 - s1)print('c_h1 =', c_h1)print('c_h2 =', c_h2)print('c_h3 =', c_h3)

运行时间,纯python是26s,jit是0.23s:

 

示例2:源代码

C[i] = A[i] + B[i]

hello_numba_gpu_02.py

import time
import numpy as np
from numba import jit
from numba import njit
from numba import cudadef f_py(a, b, c, N):for i in range(N):c[i] = a[i] + b[i]@jit
def f_bin(a, b, c, N):for i in range(N):c[i] = a[i] + b[i]@njit
def f_pure_bin(a, b, c, N):for i in range(N):c[i] = a[i] + b[i]@cuda.jit
def f_gpu(a, b, c):# like threadIdx.x + (blockIdx.x * blockDim.x)tid = cuda.grid(1)size = len(c)if tid < size:c[tid] = a[tid] + b[tid]if __name__ == "__main__":np.random.seed(1234)
#    M = np.random.random([int(4e3)] * 2)N = 1024*1024*128a_d = cuda.to_device(np.random.random(N))b_d = cuda.to_device(np.random.random(N))c_d = cuda.device_array_like(a_d)print('a_d =', a_d.copy_to_host())print('b_d =', b_d.copy_to_host())print('c_d =', c_d.copy_to_host())a_h = a_d.copy_to_host()b_h = b_d.copy_to_host()c_h = c_d.copy_to_host()f_bin(a_h, b_h, c_h, N)print('a_h =', a_h)print('b_h =', b_h)print('c_h =', c_h)c_h = np.random.random(N)#print('c_h =', c_h)f_pure_bin(a_h, b_h, c_h, N)print('c_h =', c_h)f_gpu.forall(len(a_d))(a_d, b_d, c_d)print('c_d =', c_d.copy_to_host())# Enough threads per block for several warps per blocknthreads = 256# Enough blocks to cover the entire vector depending on its lengthnblocks = (len(a_d) // nthreads) + 1f_gpu[nblocks, nthreads](a_d, b_d, c_d)print('c_d =', c_d.copy_to_host())s1 = time.time()f_py(a_h, b_h, c_h, N)e1 = time.time()print('time   py:',e1 - s1)s1 = time.time()f_bin(a_h, b_h, c_h, N)e1 = time.time()print('time  jit:',e1 - s1)s1 = time.time()f_pure_bin(a_h, b_h, c_h, N)e1 = time.time()print('time njit:',e1 - s1)s1 = time.time()f_gpu.forall(len(a_d))(a_d, b_d, c_d)e1 = time.time()print('time gpu1:',e1 - s1)s1 = time.time()f_gpu[nblocks, nthreads](a_d, b_d, c_d)e1 = time.time()print('time gpu2:',e1 - s1)

gpu的加速非常明显,N万倍:

 

http://www.lryc.cn/news/123764.html

相关文章:

  • BUUCTF 还原大师 1
  • 自定义hook之首页数据请求动作封装 hooks
  • 2023上半年京东手机行业品牌销售排行榜(京东数据平台)
  • lodash之cloneDeep()源码阅读笔记
  • 算法模版,今天开始背
  • 新的 Python URL 解析漏洞可能导致命令执行攻击
  • react项目做的h5页面加载缓慢优化(3s优化到0.6s)
  • 如何修复损坏的DOC和DOCX格式Word文件?
  • UI设计师个人工作感悟5篇
  • Java堆、栈、内存的知识
  • tp6 RabbitMQ
  • java Spring Boot yml多环境拆分文件管理优化
  • 【设计模式——学习笔记】23种设计模式——状态模式State(原理讲解+应用场景介绍+案例介绍+Java代码实现)
  • 【LeetCode每日一题】——41.缺失的第一个正数
  • typedef函数代码段解释以及部分Windows下的系统函数
  • Typora常用手册
  • 互联网发展历程:从网线不够长到中继器的引入
  • 【Java】异常处理 之 使用SLF4J 和 Logback
  • C++11并发与多线程笔记 (1)
  • 07_Hudi案例实战、Flink CDC 实时数据采集、Presto、FineBI 报表可视化等
  • ceph相关概念和部署
  • Android Jetpack Compose 中的分页与缓存展示
  • 无名管道 / 有名管道(FIFO)
  • Three.js纹理贴图
  • 1+X Web前端开发职业技能等级证书建设方案
  • Rx.NET in Action 第二章学习笔记
  • 【软件工程 | 模块耦合】什么是模块耦合及分类
  • OCT介绍和分类
  • 07-2_Qt 5.9 C++开发指南_二进制文件读写(stm和dat格式)
  • SpringBoot复习:(41)配置文件中配置的server开头的属性是怎么配置到Servlet容器中起作用的?