当前位置：首页 > article >正文

python学习day34

article 2025/9/15 15:30:56

GPU训练及类的call方法

知识点回归：

CPU性能的查看：看架构代际、核心数、线程数
GPU性能的查看：看显存、看级别、看架构代际
GPU训练的方法：数据和模型移动到GPU device上
类的call方法：为什么定义前向传播时可以直接写作self.fc1(x)

import wmi # 引入wmi模块c =  wmi.WMI() # 创建一个WMI对象processors = c.Win32_Processor()for processor in processors:print(f"CPU型号：{processor.Name}")print(f"CPU核心数：{processor.NumberOfCores}")print(f"CPU线程数：{processor.NumberOfLogicalProcessors}")

在前一天的基础上加了下面的内容，其他部分不变

# 设置GPU设备
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"使用设备: {device}")# 将数据转换为PyTorch张量并移至GPU
# 分类问题交叉熵损失要求标签为long类型
# 张量具有to(device)方法，可以将张量移动到指定的设备上
X_train = torch.FloatTensor(X_train).to(device)
y_train = torch.LongTensor(y_train).to(device)
X_test = torch.FloatTensor(X_test).to(device)
y_test = torch.LongTensor(y_test).to(device)

GPU在计算的时候，相较于cpu多了3个时间上的开销，所以本次gpu时间比cpu长

数据传输开销 (CPU 内存 <-> GPU 显存)
核心启动开销 (GPU 核心启动时间)
性能浪费：计算量和数据批次

适合：

大型数据集：例如，图像数据集成千上万张图片，每张图片维度很高。

大型模型：例如，深度卷积网络 (CNNs like ResNet, VGG) 或 Transformer 模型，它们有数百万甚至数十亿的参数，计算量巨大。

合适的批处理大小：能够充分利用 GPU 并行性的 batch size，不至于还有剩余的计算量没有被 GPU 处理。

复杂的、可并行的运算：大量的矩阵乘法、卷积等。

call方法

# 不带参数的call方法
class Counter:def __init__(self):self.count = 0def __call__(self):self.count += 1return self.count# 使用示例
counter = Counter()
print(counter())  # 输出: 1
print(counter())  # 输出: 2
print(counter.count)  # 输出: 2带参数的call方法
class Adder:def __call__(self, a, b):print("唱跳篮球rap")return a + badder = Adder()
print(adder(3, 5))  # 输出: 8

查看全文

http://www.lryc.cn/news/2396337.html