五、深度学习——CNN
一、图像基础知识
1.图像基本概念
- 图像是由像素点组成的,每个像素点的取值范围为:[0, 255](无符号整型,8bytes)。像素值越接近于0,颜色越暗,越接近于黑色;像素值越接近于255,颜色越亮,接近于白色。
- 在深度学习中,我们使用的图像大多是彩色图,彩色图由RGB3个通道组成
2.图像的加载
import matplotlib.pyplot as plt
import numpy as npimg1 = np.zeros([200, 300, 3]) # 全0,黑色图像
plt.imshow(img1)
plt.show()img2 = np.full([200, 300, 3], 255) # 白色图像
plt.imshow(img2)
plt.show()img3 = np.full([200, 300, 3], 128) # 灰色图像
plt.imshow(img3)
plt.show()
二、CNN
-
卷积神经网络 是含有卷积层的神经网络,卷积层的作用就是用来自动学习,提取图像的特征
-
CNN网络主要由三部分构成:卷积层,池化层和全连接层
-
- 卷积层负责提取图像中的局部特征
- 池化层用来大幅降低参数量级(降维)
- 全连接层用来输出想要的结果
1.卷积层
(1)卷积计算
- input 表示输入的图像
- filter表示卷积核,也叫做卷积核(滤波矩阵)
- input经过filter得到输出为最右侧的图像,该图叫做特征图
- 卷积运算本质就是在卷积核和输入数据的局部区域间做点积
(2)Padding
通过上面的卷积计算过程,最终的特征图比原始图像小很多,如果想要保持经过卷积后的图像大小不变,可以在原图像周围添加padding来实现
(3)多通道卷积计算
- 卷积核的高、宽是超参数,通道是由输入来决定的
(4)特征图大小
输出特征图的大小和以下参数息息相关:
- size:卷积核/过滤器大小,一般会选择奇数,比如有1×11\times11×1、3×33\times33×3、5×55\times55×5
- Padding:零填充的方式
- Stride:步长
计算方式:
- 输入图像的大小:W×WW\times WW×W
- 卷积核大小:F×FF\times FF×F
- StrideStrideStride SSS
- PaddingPaddingPadding SSS
- 输出图像大小:N×NN\times NN×N
- 则N=W−F+2PS+1N = \frac{W-F+2P}{S}+ 1N=SW−F+2P+1
(5)API
import torch
import torch.nn as nn
import matplotlib.pyplot as pltimg = plt.imread(r"F:\Maker\Learn_Systematically\6_Deep_learning\3_Convolutional_Neural_Networks_CNN\Meeting_at_the_Peak.jpg")
print(img.shape) # [H, W, C]img = torch.tensor(img).permute(2, 0, 1) # [H, W, C]--->[C, H, W]img = img.to(torch.float32).unsqueeze(0) # [C, H, W]--->[B, C, H, W]
print(img.shape)layer = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=(3, 3), stride = 1, padding=0)
fm = layer(img)
print(fm.shape) # (W - Kernel_size + 2Padding) / Stride + 1
输出结果:
(5760, 2912, 3)
torch.Size([1, 3, 5760, 2912])
torch.Size([1, 5, 5758, 2910])
2.池化层
- 池化层(Pooling)降低维度,缩减模型大小,提高计算速度
- 池化过程不会改变特征的通道数
- 分为两种:最大池化和平均池化
(1)API
"""最大池化"""
nn.MaxPool2d(kernel_size = 2, stride = 2, padding = 1)"""平均池化"""
nn.AvgPool2d(kernel_size = 2, stride = 1, padding = 0)
A、单通道
import torch
import torch.nn as nninputs = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]]]).float()
print(inputs.shape)pooling = nn.MaxPool2d(kernel_size=2, stride = 1, padding=0)
print(pooling(inputs))pooling = nn.AvgPool2d(kernel_size=2, stride=1, padding=0)
print(pooling(inputs))
B、多通道
import torch
import torch.nn as nninputs = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],[[10, 20, 30], [40, 50, 60], [70, 80, 90]],[[11, 22, 33], [44, 55, 66], [77, 88, 99]]]).float()
print(inputs.shape)pooling = nn.MaxPool2d(kernel_size=2, stride = 1, padding=0)
print(pooling(inputs))pooling = nn.AvgPool2d(kernel_size=2, stride=1, padding=0)
print(pooling(inputs))
三、卷积神经网络案例
我们需要搭建的网络结果如下:
- 输入形状:32×3232\times 3232×32
- 第一个卷积层输入3个Channel,输出6个Channel,kernel Size为3×33\times 33×3
- 第一个池化层输入30×3030\times3030×30,输出15×1515\times1515×15,Kernel Size为2×22\times22×2,Strides为:2
- 第二个卷积层输入6个Channel,输出16个Channel,kernel Size为3×33\times 33×3
- 第二个池化层输入13×1313\times1313×13,输出6×66\times66×6,Kernel Size为2×22\times22×2,Strides为:2
- 第一个全连接层输入576维,输出120维
- 第二个全连接层输入120维,输出84维
- 最后的输出层输入84维,输出10维
在每个卷积计算之后应用relu激活函数来给网络增加非线性因素
构建网络代码实现如下:
import matplotlib
from torch.utils.data import DataLoadermatplotlib.use('Agg') # 解决兼容性问题
import matplotlib.pyplot as plt
from torchvision.datasets import CIFAR10
from torchvision.transforms import Compose, ToTensor
import torch.nn as nn
import torch
from torchsummary import summary# 加载数据集
train_data = CIFAR10(root='data', train=True, transform=Compose([ToTensor()]), download=True)
test_data = CIFAR10(root='data', train=False, transform=Compose([ToTensor()]), download=True)# 查看数据集信息
print(train_data.data.shape)
print(test_data.data.shape)
print(train_data.classes)
print(train_data.class_to_idx)# 显示图像
plt.imshow(train_data.data[100])
plt.savefig('cifar_image.png') # 保存图像到文件
plt.close()# 如果需要在控制台查看图像路径
print("图像已保存至: cifar_image.png")"""模型构建"""class imgClassification(nn.Module):# 初始化def __init__(self):super(imgClassification, self).__init__()self.layer1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)self.pooling1 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)self.layer2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3, stride=1, padding=0)self.pooling2 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)self.layer3 = nn.Linear(in_features=576, out_features=120)self.layer4 = nn.Linear(in_features=120, out_features=84)self.out = nn.Linear(in_features=84, out_features=10)def forward(self, x):x = torch.relu(self.layer1(x))x = self.pooling1(x)x = torch.relu(self.layer2(x))x = self.pooling2(x)x = torch.reshape(x, [x.size(0), -1])x = torch.relu(self.layer3(x))x = torch.relu(self.layer4(x))out = self.out(x)return out# 实例化
model = imgClassification()
summary(model, input_size=(3, 32, 32), batch_size=1)"""模型训练"""def train():pass# 损失函数cri = nn.CrossEntropyLoss()# 优化器optimizer = torch.optim.Adam(model.parameters(), lr = 0.001, betas=(0.9, 0.99))# 遍历每个轮次epochs = 10loss_mean = []for epoch in range(epochs):dataloader = DataLoader(dataset=train_data, batch_size=2, shuffle=True)loss_sum = 0sample = 0.1# 每隔遍历for x, y in dataloader:y_pre = model(x)loss = cri(y_pre, y)loss_sum += loss.item()sample += 1optimizer.zero_grad()loss.backward()optimizer.step()breakloss_mean.append(loss_sum /sample)print(loss_sum / sample)print('-'*50)print(loss_mean)# 保存模型权重torch.save(model.state_dict(), r'F:\Maker\Learn_Systematically\6_Deep_learning'r'\3_Convolutional_Neural_Networks_CNN\model.pth')train()
输出结果:
(50000, 32, 32, 3)
(10000, 32, 32, 3)
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
{'airplane': 0, 'automobile': 1, 'bird': 2, 'cat': 3, 'deer': 4, 'dog': 5, 'frog': 6, 'horse': 7, 'ship': 8, 'truck': 9}
图像已保存至: cifar_image.png
----------------------------------------------------------------Layer (type) Output Shape Param #
================================================================Conv2d-1 [1, 6, 30, 30] 168MaxPool2d-2 [1, 6, 15, 15] 0Conv2d-3 [1, 16, 13, 13] 880MaxPool2d-4 [1, 16, 6, 6] 0Linear-5 [1, 120] 69,240Linear-6 [1, 84] 10,164Linear-7 [1, 10] 850
================================================================
Total params: 81,302
Trainable params: 81,302
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.08
Params size (MB): 0.31
Estimated Total Size (MB): 0.40
----------------------------------------------------------------
2.1190285682678223
2.0913889191367407
2.1235652403397993
2.134627428921786
2.109860506924716
2.0705240423029116
2.112814729863947
2.0445303483442827
2.124096263538707
2.1400482004339043
--------------------------------------------------
[2.1190285682678223, 2.0913889191367407, 2.1235652403397993, 2.134627428921786, 2.109860506924716, 2.0705240423029116, 2.112814729863947, 2.0445303483442827, 2.124096263538707, 2.1400482004339043]