当前位置：首页 > news >正文

mpi4py 运行过程中出现Read -1, expected xxx, errno = 1 解决方案

news 2025/7/5 7:59:21

问题描述

代码1（串行）

代码2（并行）

代码2执行时所用指令

错误信息

解决方案

解决方案1

解决方案2

问题描述

今天正在学习使用mpi4py，在对比运行以下2个代码时疯狂报错：

代码1（串行）

import numpy as np
import timenp.random.seed(2)
size = 1000000x1 = np.random.random(size)
x2 = np.random.random(size)
result = np.zeros(size, dtype=float)since = time.time()
for i in range(size):result[i] = x1[i] + x2[i]
end = time.time()print(end - since)

代码2（并行）

from mpi4py import MPI
import numpy as np
import timecomm = MPI.COMM_WORLD
rank = comm.Get_rank()
nprocs = comm.Get_size()size = 1000000
x1 = np.random.random(size)
x2 = np.random.random(size)if rank == 0:ave, res = divmod(size, nprocs)count = [ave + 1 if p < res else ave for p in range(nprocs)]count = np.array(count)displ = [sum(count[:p]) for p in range(nprocs)]displ = np.array(displ)
else:sendbuf = Nonecount = np.zeros(nprocs, dtype=np.int)displ = Nonet0 = time.time()
comm.Bcast(count, root=0)recvbuf1 = np.zeros(count[rank])
recvbuf2 = np.zeros(count[rank])comm.Scatterv([x1, count, displ, MPI.DOUBLE], recvbuf1, root=0)
comm.Scatterv([x2, count, displ, MPI.DOUBLE], recvbuf2, root=0)print('After Scatterv, process {} has data:'.format(rank), recvbuf1)
print('After Scatterv, process {} has data:'.format(rank), recvbuf2)for i in range(recvbuf1.shape[0]):recvbuf1[i] += recvbuf2[i]sendbuf2 = recvbuf1
recvbuf2 = np.zeros(sum(count))
comm.Gatherv(sendbuf2, [recvbuf2, count, displ, MPI.DOUBLE], root=0)if comm.Get_rank() == 0:print('pi computed in {:.3f} sec'.format(time.time() - t0))print('After Gatherv, process 0 has data:', recvbuf2)

代码2执行时所用指令

# mpi_test.py是该代码存放的代码文件，代码是以root的权限执行的
mpirun -np 4 --allow-run-as-root python mpi_test.py

错误信息

这个错误是我第三次尝试解决，这次终于找到了解决方案，太不容易了，QAQ

解决方案

参考链接：

python - Possible buffer size limit in mpi4py Reduce() - Stack Overflow

链接中指出，出现这个错误的主要原因是由于

The issue comes from the Cross-Memory Attach (CMA) system calls process_vm_readv() and process_vm_writev() that the shared-memory BTLs (Byte Transfer Layers, a.k.a. the things that move bytes between ranks) of Open MPI use to accelerate shared-memory communication between ranks that run on the same node by avoiding copying the data twice to and from a shared-memory buffer. This mechanism involves some setup overhead and is therefore only used for larger messages, which is why the problem only starts occurring after the messages size crosses the eager threshold.

有以下两个解决方案：

解决方案1

在执行docker run时，带上参数

--cap-add=SYS_PTRACE

但是由于我拿到的是分好的docker，并不具备执行docker run指令的权限，所以只能选择解决方案2中的解决方法。

解决方案2

禁用CMA。

如果是Open MPI 1.8之前的版本，在执行mpirun时带上参数：

mpirun --mca btl_sm_use_cma 0 ...

如果是Open MPI 1.8之后的版本，执行mpirun时带上参数：

mpirun --mca btl_vader_single_copy_mechanism none

附上一个原网站的回答截图以备后续查阅：

查看全文

http://www.lryc.cn/news/33563.html

PMP考前冲刺3.07 | 2023新征程，一举拿证

60条Python日常工作中的高频写法，收藏

（小甲鱼python）函数笔记合集七函数(XI)总结 python函数的函数文档、类型注释、内省详解

Leetcode是什么

2023-03-07 MySQL—基于规则优化-子查询优化

Rocketmq技术详解

TeeChart VCL/FMX v2023 crack

[Java·算法·困难]LeetCode32. 最长有效括号

pytorch如何搭建一个最简单的模型，

JS实现css的hover效果，兼容移动端

企业微信的后台怎么进入和管理？

【2223sW2】LOG2

buuctf-web-[SUCTF 2018]MultiSQL1

GitLab创建仓库分配权限

代码随想录-51-110.平衡二叉树

项目实战典型案例27——对生产环境以及生产数据的敬畏之心

如何查找你的IP地址？通过IP地址能直接定位到你家！

Containers--array类

LinqConnect兼容性并支持Visual Studio 2022版本

流量监管与整形

详解init 容器

RequestResponseBodyMethodProcessor

函数的极限

dnf命令使用

CLIP CLAP

Debezium报错处理系列之五十二：解决Sql Server数据库安装后修改主机名导致sqlserver数据库实例名称没有修改从而无法设置CDC的问题

scratch老鹰捉小鸡电子学会图形化编程scratch等级考试二级真题和答案解析2022年12月

概率论小课堂：公理化过程（大数据方法解决问题的理论基础）

WOW64 IsWow64Process GetNativeSystemInfoWindows System32 SysWOW64

Spring Boot 3.0系列【10】核心特性篇之应用配置的高阶用法

问题描述

代码1（串行）

代码2（并行）

代码2执行时所用指令

错误信息

解决方案

解决方案1

解决方案2

相关文章：