当前位置：首页 > news >正文

Metal入门学习：GPU并行计算大数组相加

news 2025/7/10 10:35:57

一、编程指南PDF下载链接(中英文档）

1、Metal编程指南PDF链接
https://github.com/dennie-lee/ios_tech_record/raw/main/Metal学习PDF/Metal 编程指南.pdf
2、Metal着色语言(Metal Shader Language:简称MSL)编程指南PDF链接
https://github.com/dennie-lee/ios_tech_record/raw/main/Metal学习PDF/Metal 着色语言指南.pdf
3、补充：官网API文档链接
https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu

二、内容前述

本文章通过元素个数相同的两个数组对应位置相加来切入Metal（GPU）的并行计算着色函数（本篇文章未涉及渲染函数：顶点着色函数和片元着色函数，会另起一篇文章介绍）。在此示例中，可以了解所有 Metal 应用程序中使用的基本任务。您将看到如何将用 C 编写的简单函数转换为Metal着色语言 (Metal Shader Language ：MSL)，以便它可以在 GPU 上运行。通过创建管道，准备MSL函数在其上运行，并创建GPU可访问的数据对象。要针对您的数据执行管道，创建命令缓冲区，将命令写入其中，然后将缓冲区提交到命令队列，Metal将命令发送到GPU执行。
（下图是官网并行计算着色函数流程原理图：）
在这里插入图片描述

三、C语言和MSL语言对两个数组相加的函数对比

1、C语言函数：两个数组相加

void add_arrays(const float* inA,const float* inB,float* result,int length){for (int index = 0; index < length ; index++){result[index] = inA[index] + inB[index];}
}

2、MSL语言函数：两个数组相加

kernel void add_arrays(device const float* inA,device const float* inB,device float* result,uint index [[thread_position_in_grid]]){result[index] = inA[index] + inB[index];
}

关键词解释说明（在MSL编程指南PDF中可查看，点击上面的链接下载即可）
在这里插入图片描述
[[thread_position_in_grid]]文档中也有，更加详细的解析说明可查看这篇文章：https://juejin.cn/post/7085633906501746724，文章中还有延伸说明解释[[threadgroup_position_in_grid]]和[[threads_per_threadgroup]]

四、关键代码段解析

1、初始化、加载扩展名为.metal的文件、创建管道状态对象加载并行计算着色函数

//_mDevice = MTLCreateSystemDefaultDevice();
// Load the shader files with a .metal file extension in the project
id<MTLLibrary> newDefaultLibrary = [_mDevice newDefaultLibrary];
if (newDefaultLibrary == nil){NSLog(@"Failed to find the default library");return nil;
}id<MTLFunction> newFunction = [newDefaultLibrary newFunctionWithName:@"add_arrays"];
if (newFunction == nil){NSLog(@"Failed to find the adder function");return nil;
}NSError *error;
// Create a compute pipeline state object.
//根据扩展名为.metal文件中kernel定义的函数创建计算管道（项目Add文件）
_mAddFunctionPSO = [_mDevice newComputePipelineStateWithFunction:newFunction error:&error];
if (_mAddFunctionPSO == nil){//  If the Metal API validation is enabled, you can find out more information about what//  went wrong.  (Metal API validation is enabled by default when a debug build is run//  from Xcode)NSLog(@"Failed to create pipeline state object,error : %@)",error);return nil;
}// 指令队列
_mCommandQueue = [_mDevice newCommandQueue];
if (_mCommandQueue == nil){NSLog(@"Failed to find command queue");return nil;
}

2、初始化数组数据

//初始化数组数据
- (void)prepareData{_mBufferA = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];_mBufferB = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];_mBufferResult = [_mDevice newBufferWithLength:bufferSize options:MTLResourceStorageModeShared];[self generateRandomFloatData:_mBufferA];[self generateRandomFloatData:_mBufferB];
}- (void)generateRandomFloatData:(id<MTLBuffer>)buffer{float *dataPtr = buffer.contents;for (int index = 0; index < arrayLength; index++){dataPtr[index] = (float)rand() / (float)RAND_MAX;}
}

3、指令参数添加，提交GPU计算

- (void)sendComputeCommand{//create a command buffer to hold commands//创建指令缓存冲区id<MTLCommandBuffer> commandBuffer = [_mCommandQueue commandBuffer];assert(commandBuffer != nil);//开始进行指令添加参数id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];assert(computeEncoder != nil);NSTimeInterval startTimeInterval = [[[NSDate alloc] init] timeIntervalSince1970];[self encodeAdderCommand:computeEncoder];//添加参数完毕，[computeEncoder endEncoding];//提交执行指令[commandBuffer commit];//等待计算完毕[commandBuffer waitUntilCompleted];NSTimeInterval endTimeInterval = [[[NSDate alloc] init] timeIntervalSince1970];NSLog(@"长度为：%u 的两个数组相加（Metal方式）花费的时间：%f",arrayLength,(endTimeInterval - startTimeInterval));//验证计算结果[self verifyResults];
}- (void)encodeAdderCommand:(id<MTLComputeCommandEncoder>)computeEncoder{//encode pipeline state object and its parameters[computeEncoder setComputePipelineState:_mAddFunctionPSO];//此处的atIndex为0，与MSL函数中参数对应顺序对应（也可以设置buffer(0)，此处可先忽略buffer，详细信息可查看着色语言编程指南PDF）[computeEncoder setBuffer:_mBufferA offset:0 atIndex:0];//此处的atIndex为1，与MSL函数中参数对应顺序对应（也可以设置buffer(1)，此处可先忽略buffer，详细信息可查看着色语言编程指南PDF）[computeEncoder setBuffer:_mBufferB offset:0 atIndex:1];//此处的atIndex为1，与MSL函数中参数对应顺序对应（也可以设置buffer(2)，此处可先忽略buffer，详细信息可查看着色语言编程指南PDF）[computeEncoder setBuffer:_mBufferResult offset:0 atIndex:2];//全部线程数量，此处对应[[thread_position_in_grid]]，理解此处请看第三.2对应关键词解释的文章MTLSize gridSize = MTLSizeMake(arrayLength, 1, 1);NSUInteger threadGroupSize = _mAddFunctionPSO.maxTotalThreadsPerThreadgroup;if (threadGroupSize > arrayLength){threadGroupSize = arrayLength;}//线程组大小MTLSize threadGroupsize = MTLSizeMake(threadGroupSize, 1, 1);[computeEncoder dispatchThreads:gridSize threadsPerThreadgroup:threadGroupsize];
}