当前位置：首页 > article >正文

【深度学习-pytorch篇】4. 正则化方法（Regularization Techniques）

article 2025/9/10 5:19:03

正则化方法（Regularization Techniques）

1. 目标

理解什么是过拟合及其影响
掌握常见正则化技术：L2 正则化、Dropout、Batch Normalization、Early Stopping
能够使用 PyTorch 编程实现这些正则化方法并进行比较分析

2. 数据构造与任务设定

本实验是一个带噪声的回归任务，目标函数为 $\mathcal{N}(0, \sigma^2)$ 。使用均匀分布采样输入 $\in [-1, 1]$ 。

import numpy as np
import torch
import torch.utils.data as DataN_SAMPLES = 20
NOISE_RATE = 0.4train_x = np.linspace(-1, 1, N_SAMPLES)[:, np.newaxis]
train_y = train_x + np.random.normal(0, NOISE_RATE, train_x.shape)validate_x = np.linspace(-1, 1, N_SAMPLES // 2)[:, np.newaxis]
validate_y = validate_x + np.random.normal(0, NOISE_RATE, validate_x.shape)test_x = np.linspace(-1, 1, N_SAMPLES)[:, np.newaxis]
test_y = test_x + np.random.normal(0, NOISE_RATE, test_x.shape)# 转换为 Tensor
train_x = torch.tensor(train_x, dtype=torch.float32)
train_y = torch.tensor(train_y, dtype=torch.float32)
validate_x = torch.tensor(validate_x, dtype=torch.float32)
validate_y = torch.tensor(validate_y, dtype=torch.float32)
test_x = torch.tensor(test_x, dtype=torch.float32)
test_y = torch.tensor(test_y, dtype=torch.float32)train_dataset = Data.TensorDataset(train_x, train_y)
train_loader = Data.DataLoader(dataset=train_dataset, batch_size=10, shuffle=True)

3. 模型定义

3.1 原始 MLP（无正则化）

import torch.nn as nn
import torch.nn.init as initclass FC_Classifier(nn.Module):def __init__(self, input_dim=1, hidden_dim=100, output_dim=1):super().__init__()self.fc1 = nn.Linear(input_dim, hidden_dim)self.fc2 = nn.Linear(hidden_dim, output_dim)self.activation = nn.ReLU()self._init_weights()def _init_weights(self):init.normal_(self.fc1.weight, mean=0.0, std=0.1)init.constant_(self.fc1.bias, 0)init.normal_(self.fc2.weight, mean=0.0, std=0.1)init.constant_(self.fc2.bias, 0)def forward(self, x):x = self.activation(self.fc1(x))return self.fc2(x)

3.2 Dropout MLP

class DropoutMLP(nn.Module):def __init__(self, dropout_rate=0.5):super().__init__()self.fc1 = nn.Linear(1, 100)self.dropout = nn.Dropout(dropout_rate)self.fc2 = nn.Linear(100, 1)self.activation = nn.ReLU()self._init_weights()def _init_weights(self):init.normal_(self.fc1.weight, mean=0.0, std=0.1)init.constant_(self.fc1.bias, 0)init.normal_(self.fc2.weight, mean=0.0, std=0.1)init.constant_(self.fc2.bias, 0)def forward(self, x):x = self.dropout(self.fc1(x))x = self.activation(x)return self.fc2(x)

3.3 Batch Normalization MLP

class BNMLP(nn.Module):def __init__(self):super().__init__()self.bn_input = nn.BatchNorm1d(1)self.fc1 = nn.Linear(1, 100)self.bn_hidden = nn.BatchNorm1d(100)self.fc2 = nn.Linear(100, 1)self.activation = nn.ReLU()def forward(self, x):x = self.bn_input(x)x = self.fc1(x)x = self.bn_hidden(x)x = self.activation(x)return self.fc2(x)

4. Early Stopping 策略

当验证集误差连续若干轮无提升时，提前停止训练，避免过拟合。

max_patience = 5
patience = 0
best_val_loss = float("inf")
is_early_stop = False

5. RMSNorm 实现与讲解

5.1 原理说明

RMSNorm 是一种替代 LayerNorm 的轻量化归一化方法：

不减均值
仅用激活值的均方根进行归一化
不依赖 batch 维度

数学公式：

$\text{RMS}(x) = \sqrt{\frac{1}{n} \sum_{i=1}^n x_i^2}$

$\text{RMSNorm}(x) = \frac{x}{\text{RMS}(x) + \epsilon} \cdot \gamma$

其中 $\gamma$ 为可学习参数， $\epsilon$ 是一个很小的数避免除以 0。

5.2 代码实现

class RMSNorm(nn.Module):def __init__(self, hidden_size, eps=1e-6):super().__init__()self.weight = nn.Parameter(torch.ones(hidden_size))self.eps = epsdef forward(self, x):rms = torch.sqrt(torch.mean(x ** 2, dim=-1, keepdim=True) + self.eps)return self.weight * x / rms