当前位置：首页 > news >正文

【零基础学AI】第24讲：卷积神经网络（CNN）架构设计

news 2025/7/6 7:41:51

在这里插入图片描述

本节课你将学到

理解卷积神经网络的核心思想
掌握CNN三大核心操作原理
学会使用TensorFlow构建CNN模型
实现一个图像识别系统

开始之前

环境要求

Python 3.8+
需要安装的包：
- tensorflow==2.8.0
- matplotlib==3.4.0
- opencv-python==4.5.5

前置知识

神经网络基础（第23讲）
矩阵运算（第4讲）
图像处理基础（第4讲NumPy实践）

核心概念

为什么需要CNN？

想象你要识别一张图片中的猫：

全连接网络的问题：

28x28图像展开为784维向量会丢失空间信息
参数过多（784x128=100352个参数仅第一层）
无法有效识别平移、旋转后的图像

CNN的解决方案：

局部感受野：每次只看图像的一小块区域
参数共享：同样的特征检测器扫描整个图像
平移不变性：无论猫在图像什么位置都能识别

CNN三大核心操作

1. 卷积（Convolution） - 特征提取

就像用放大镜观察图像：

卷积核（滤波器）：放大镜的特定图案（如边缘检测器）
滑动窗口：在图像上逐块检查相似度
特征图：记录每个位置的匹配程度

# 示例：垂直边缘检测器
kernel = [[1, 0, -1],[1, 0, -1],[1, 0, -1]]

2. 池化（Pooling） - 降维压缩

就像看地图时缩放比例：

最大池化：取区域内最大值（保留最显著特征）
平均池化：取区域平均值（平滑处理）
效果：
- 减少计算量
- 增强平移不变性
- 防止过拟合

3. 全连接（Fully Connected） - 分类决策

将高级特征映射到类别：

展平最后的特征图
通过常规神经网络进行分类
通常最后使用Softmax激活函数

CNN典型架构

输入图像 → [卷积→激活→池化]×N → 展平 → 全连接层 → 输出分类

经典网络示例：

LeNet-5（1998）：首个成功CNN，用于支票数字识别
AlexNet（2012）：ImageNet竞赛冠军，开启深度学习热潮
ResNet（2015）：残差连接，解决深层网络梯度消失

卷积超参数详解

滤波器数量：决定提取多少种特征
核大小：常见3x3或5x5
步长（Stride）：滑动步长，影响输出尺寸
填充（Padding）：
- ‘valid’：不填充（输出尺寸减小）
- ‘same’：填充使输出尺寸不变

代码实战

1. 准备CIFAR-10数据集

import tensorflow as tf
import matplotlib.pyplot as plt# 加载数据
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()# 类别名称
class_names = ['飞机', '汽车', '鸟', '猫', '鹿', '狗', '蛙', '马', '船', '卡车']# 数据预处理
train_images = train_images / 255.0
test_images = test_images / 255.0# 可视化样本
plt.figure(figsize=(10,5))
for i in range(10):plt.subplot(2,5,i+1)plt.imshow(train_images[i])plt.title(class_names[train_labels[i][0]])plt.axis('off')
plt.show()

2. 构建CNN模型

model = tf.keras.Sequential([# 卷积层1：32个3x3滤波器，ReLU激活tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),# 最大池化：2x2窗口tf.keras.layers.MaxPooling2D((2,2)),# 卷积层2：64个3x3滤波器tf.keras.layers.Conv2D(64, (3,3), activation='relu'),tf.keras.layers.MaxPooling2D((2,2)),# 卷积层3：64个3x3滤波器tf.keras.layers.Conv2D(64, (3,3), activation='relu'),# 展平后接全连接层tf.keras.layers.Flatten(),tf.keras.layers.Dense(64, activation='relu'),# 输出层：10个类别tf.keras.layers.Dense(10, activation='softmax')
])model.summary()

3. 训练与评估

# 编译模型
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])# 训练模型
history = model.fit(train_images, train_labels, epochs=10,validation_data=(test_images, test_labels))# 评估模型
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f"测试准确率: {test_acc*100:.2f}%")# 可视化训练过程
plt.plot(history.history['accuracy'], label='训练准确率')
plt.plot(history.history['val_accuracy'], label='验证准确率')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

4. 可视化卷积核效果

# 获取第一层卷积核
layer = model.layers[0]
weights, biases = layer.get_weights()# 归一化滤波器便于显示
f_min, f_max = weights.min(), weights.max()
filters = (weights - f_min) / (f_max - f_min)# 可视化前6个滤波器
plt.figure(figsize=(10,5))
for i in range(6):plt.subplot(2,3,i+1)plt.imshow(filters[:,:,:,i])plt.axis('off')
plt.suptitle('卷积核可视化')
plt.show()

5. 自定义图像预测

import cv2
import numpy as npdef predict_custom_image(img_path):# 读取图像img = cv2.imread(img_path)img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# 预处理img_resized = cv2.resize(img, (32,32))img_normalized = img_resized / 255.0img_input = np.expand_dims(img_normalized, axis=0)# 预测predictions = model.predict(img_input)predicted_class = np.argmax(predictions)# 可视化plt.figure(figsize=(5,5))plt.imshow(img_resized)plt.title(f"预测: {class_names[predicted_class]}\n置信度: {predictions[0][predicted_class]*100:.2f}%")plt.axis('off')plt.show()# 使用示例（需准备自己的图片）
# predict_custom_image('my_cat.jpg')

完整项目

项目结构

lesson_24_cnn/
├── README.md
├── requirements.txt
├── cnn_cifar10.py         # 主程序文件
├── utils/
│   ├── image_utils.py     # 图像处理工具
│   └── visualization.py   # 可视化工具
├── data/                  # 存放自定义测试图片
└── output/                # 输出结果├── model_weights.h5├── training_curve.png└── filters_visual.png

requirements.txt

tensorflow==2.8.0
matplotlib==3.4.0
opencv-python==4.5.5
numpy==1.21.0

cnn_cifar10.py

import tensorflow as tf
from utils import visualization as vis
from utils import image_utils as img_utildef build_cnn_model():model = tf.keras.Sequential([tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),tf.keras.layers.MaxPooling2D((2,2)),tf.keras.layers.Conv2D(64, (3,3), activation='relu'),tf.keras.layers.MaxPooling2D((2,2)),tf.keras.layers.Conv2D(64, (3,3), activation='relu'),tf.keras.layers.Flatten(),tf.keras.layers.Dense(64, activation='relu'),tf.keras.layers.Dense(10, activation='softmax')])model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])return modeldef main():# 加载数据(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()train_images, test_images = train_images / 255.0, test_images / 255.0# 构建模型model = build_cnn_model()model.summary()# 训练模型history = model.fit(train_images, train_labels,epochs=10,validation_data=(test_images, test_labels))# 保存模型model.save('output/model_weights.h5')# 评估与可视化test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)print(f"测试准确率: {test_acc*100:.2f}%")vis.plot_training_history(history)vis.visualize_filters(model.layers[0])# 自定义图像预测img_path = 'data/my_test_image.jpg'  # 替换为你的图片路径img_util.predict_and_show(model, img_path)if __name__ == "__main__":main()

运行效果

控制台输出

Model: "sequential"
_________________________________________________________________Layer (type)                Output Shape              Param #   
=================================================================conv2d (Conv2D)             (None, 30, 30, 32)        896       max_pooling2d (MaxPooling2D  (None, 15, 15, 32)       0         )                                                               conv2d_1 (Conv2D)           (None, 13, 13, 64)        18496     max_pooling2d_1 (MaxPooling  (None, 6, 6, 64)         0         2D)                                                             conv2d_2 (Conv2D)           (None, 4, 4, 64)          36928     flatten (Flatten)           (None, 1024)              0         dense (Dense)               (None, 64)                65600     dense_1 (Dense)             (None, 10)                650       =================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________Epoch 1/10
1563/1563 [==============================] - 15s 9ms/step - loss: 1.5216 - accuracy: 0.4473 - val_loss: 1.2635 - val_accuracy: 0.5494
...
Epoch 10/10
1563/1563 [==============================] - 14s 9ms/step - loss: 0.6852 - accuracy: 0.7619 - val_loss: 0.8845 - val_accuracy: 0.7032测试准确率: 70.32%