当前位置: 首页 > news >正文

RuntimeError: Unexpected error from cudaGetDeviceCount

RuntimeError: Unexpected error from cudaGetDeviceCount

  • 0. 引言
  • 1. 临时解决方法

0. 引言

使用 vllm-0.4.2 部署时,多卡正常运行。升级到 vllm-0.5.1 时,报错如下:

(VllmWorkerProcess pid=30692) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30694) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30693) Traceback (most recent call last):
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30693)     self.run()
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30693)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30693)     worker = worker_factory()
(VllmWorkerProcess pid=30693)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30693)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30693)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30693)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30693)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30693)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30693)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30693)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30693)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30693)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30693)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30693)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30693)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30693)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30693)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30693)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30693)     torch._C._cuda_init()
(VllmWorkerProcess pid=30693) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30692) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30692) Traceback (most recent call last):
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30692)     self.run()
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30692)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30692)     worker = worker_factory()
(VllmWorkerProcess pid=30692)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30692)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30692)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30692)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30692)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30692)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30692)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30692)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30692)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30692)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30692)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30692)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30692)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30692)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30692)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30692)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30692)     torch._C._cuda_init()
(VllmWorkerProcess pid=30692) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30694) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30694) Traceback (most recent call last):
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30694)     self.run()
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30694)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30694)     worker = worker_factory()
(VllmWorkerProcess pid=30694)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30694)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30694)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30694)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30694)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30694)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30694)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30694)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30694)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30694)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30694)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30694)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30694)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30694)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30694)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30694)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30694)     torch._C._cuda_init()
(VllmWorkerProcess pid=30694) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
ERROR 07-12 08:16:26 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 30693 died, exit code: 1
INFO 07-12 08:16:26 multiproc_worker_utils.py:123] Killing local vLLM worker processes

1. 临时解决方法

vi /root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py--- 设置成固定的 `backend = _Backend.XFORMERS`。# backend = which_attn_to_use(num_heads, head_size, num_kv_heads,#                           sliding_window, dtype, kv_cache_dtype,#                            block_size)backend = _Backend.XFORMERS
---

完结!

http://www.lryc.cn/news/399231.html

相关文章:

  • uboot学习:(一)基础认知
  • 每天一个数据分析题(四百二十六)- 总体方差
  • 【C++】设计一套基于C++与C#的视频播放软件
  • 数学建模中的辅助变量、中间变量、指示变量
  • python的seek()和tell()
  • Go泛型详解
  • 【每日一练】python之sum()求和函数实例讲解
  • 打造智慧校园德育管理,提升学生操行基础分
  • 自定义函数---随机数系列函数
  • 一文了解5G新通话技术演进与业务模型
  • 视频使用操作说明书-T80002系列视频编码器如何对接海康NVR硬盘录像机,包括T80002系列高清HDMI编码器、4K超高清HDMI编码器
  • el-input-number计数器change事件校验数据,改变绑定数据值后change方法失效问题的原因及解决方法
  • 将vue项目整合到springboot项目中并在阿里云上运行
  • AC修炼计划(AtCoder Regular Contest 179)A~C
  • 开发编码规范笔记
  • spring boot easyexcel
  • Docker 部署 ShardingSphere-Proxy 数据库中间件
  • Qt常用快捷键
  • 关于RiboSeq分析流程的总结
  • NLP任务:情感分析、看图说话
  • Linux桌面溯源
  • 深入Linux:权限管理与常用命令详解
  • Mojo 编程语言:AI开发者的新宠儿
  • ARM/Linux嵌入式面经(十):极氪
  • 【PVE】新增2.5G网卡作为主网卡暨iperf测速流程
  • 数学建模美赛入门
  • 两段序列帧动画播放,在ios机型上出现闪屏
  • 【C++深度探索】全面解析多态性机制(二)
  • MySQL配置数据库的连接命令
  • [PaddlePaddle飞桨] PaddleSpeech-自动语音识别-小模型部署