当前位置：首页 > news >正文

BatchNorm1d的复现以及对参数num_features的理解

news 2025/9/15 5:01:08

0. Intro

以pytorch为例，BatchNorm1d的参数num_features涉及了对什么数据进行处理，但是我总是记不住，写个blog帮助自己理解QAQ

1. 复现`nn.BatchNorm1d(num_features=1)`

假设有一个input tensor：

input = torch.tensor([[[1.,2.,3.,4.]],[[0.,0.,0.,0.]]])
print(input.shape)
# torch.Size([2, 1, 4])

nn.BatchNorm1d(num_features=1)函数介绍

这个函数长这个样子：
torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)
使用起来是这样的:

BN1 = nn.BatchNorm1d(num_features=1,affine=False,eps=0)   
# input只有1个feature（只有1个channel），每个features的长度=4，第一个batch
print("---BN1---")
print(torch.squeeze(BN1(input)))

注意1：函数参数eps=0是为了让下图这个batchnorm的公式的这个等于0（起保护作用），eps默认为1e-5
注意2：上式里的 $γ\gamma$ 和 $β\beta$ 分别默认值是1和0，因此只要设置affine=False就可以使用了，注意affine默认为True
input shape符合BatchNorm1d要求的[B,C,L]的格式，这里num_features=1与C对应
上面函数的输出为：

---BN1---
tensor([[-0.1690,  0.5071,  1.1832,  1.8593],[-0.8452, -0.8452, -0.8452, -0.8452]])

nn.BatchNorm1d(num_features=1)复现结果：

ans = (input-torch.mean(torch.flatten(input)))/torch.sqrt(torch.var(torch.flatten(input),unbiased=False))
print(torch.squeeze(ans))

注意1：torch.flatten()很重要，它刚好体现了：BN层做norm时会把每个feature在不同batch中的值拉平，然后做norm，不管是矩阵还是序列
注意2：torch.var的参数unbiased=False表示求方差时分母是n，也就是不需要求无偏的方差
它的输出为：

tensor([[-0.1690,  0.5071,  1.1832,  1.8593],[-0.8452, -0.8452, -0.8452, -0.8452]])

一模一样

2. 复现`nn.BatchNorm1d(num_features=4)`

依然假设有一个input tensor，和上面一样，复制过来

input = torch.tensor([[[1.,2.,3.,4.]],[[0.,0.,0.,0.]]])
print(input.shape)
# torch.Size([2, 1, 4])

nn.BatchNorm1d(num_features=4) 函数介绍

首先这个函数使用起来是这样的：

BN2 = nn.BatchNorm1d(num_features=4,affine=False,eps=0)
print("---BN2---")
print(BN2(torch.squeeze(input)))

注意点1：torch.squeeze是必须的，使用之后tensor的shape会从torch.Size([2, 1, 4])变为torch.Size([2, 4])，符合BatchNorm1d要求的[B,C]的格式，这里num_features=4与C对应
上面的函数输出为

---BN2---
tensor([[ 1.,  1.,  1.,  1.],[-1., -1., -1., -1.]])

复现

重点来了，我们理解一下num_features=4，对于现在的input data（经过squeeze之后shape为[B,C] = [2,4]），input data的每个feature现在是一个single value值（不是序列或者矩阵），因此这里可以对某个feature手动计算一下：
- 以最后一个feature为例：[4,0],可以计算得mean=2，sqrt(var)=2，因此([4,0]-mean)/sqrt(var)=[1,-1]
- 同理可以计算其他3个feature
一模一样

上面的代码：

input = torch.tensor([[[1.,2.,3.,4.]],[[0.,0.,0.,0.]]])
print(input.shape)BN1 = nn.BatchNorm1d(num_features=1,affine=False,eps=0)   # 每个features的长度=4，第一个batch
print("---BN1---")
print(torch.squeeze(BN1(input)))
print("---BN1 Repeat---")
ans = (input-torch.mean(torch.flatten(input)))/torch.sqrt(torch.var(torch.flatten(input),unbiased=False) )
print(torch.squeeze(ans))BN2 = nn.BatchNorm1d(num_features=4,affine=False,eps=0)
print("---BN2---")
print(BN2(torch.squeeze(input)))
# BN2就手动算一下啦