当前位置：首页 > news >正文

YOLOv11改进系列---Conv篇---2024最新深度可分卷积与多尺度卷积结合的模块MSCB助力yolov11有效涨点

news 2025/9/14 18:06:55

一、本文介绍

本文给大家带来的最新改进机制是2024最新深度可分卷积与多尺度卷积的结合的模块MSCB，其核心机制是 Multi-scale Depth-wise Convolution (MSDC) 是一种改进的卷积神经网络（CNN）结构，旨在提升卷积操作的多尺度特征提取能力。它的核心思想是通过在多个尺度下进行卷积操作，以捕获不同层级的图像特征，同时保持 深度可分卷积（Depth-wise Convolution） 的计算效率，我将其和C3k2进行结合（多种结合方式），分别为辅助yolov11进行特征提取能力和特征融合能力，本文内容为独家创新，文章内涵代码和添加方法。

训练信息：YOLO11-C3k2-MSCB1 summary: 395 layers, 2,555,235 parameters, 2,555,219 gradients, 6.3 GFLOPs

训练信息：YOLO11-C3k2-MSCB2 summary: 410 layers, 2,358,867 parameters, 2,358,851 gradients, 6.2 GFLOPs

未优化版本：YOLO11 summary: 319 layers, 2,590,035 parameters, 2,590,019 gradients, 6.4 GFLOPs

系列专栏：

YOLOv11改进（更换卷积、添加注意力、更换主干网络、图像去噪、去雾、增强等）涨点系列------发论文必备https://blog.csdn.net/m0_58941767/category_12987736.html?spm=1001.2014.3001.5482https://blog.csdn.net/m0_58941767/category_12987736.html?spm=1001.2014.3001.5482

一、本文介绍

二、原理介绍

1. 深度可分卷积（Depth-wise Convolution）

2. 多尺度卷积核的引入

3. 深度可分卷积与多尺度卷积的结合

三、核心代码

四、使用方式

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

五、正式训练

5.1 yaml文件1

5.2 yaml文件2

5.3 训练代码

5.4 训练过程截图

五、本文总结

二、原理介绍

论文地址： 官方论文地址

代码地址： 官方代码地址

Multi-scale Depth-wise Convolution (MSDC) 是一种改进的卷积神经网络（ CNN ）结构，旨在提升卷积操作的多尺度特征提取能力。它的核心思想是通过在多个尺度下进行卷积操作，以捕获不同层级的图像特征，同时保持 深度可分卷积（Depth-wise Convolution） 的计算效率。

下面我们了解一下MSDC的 基本原理：

1. 深度可分卷积（Depth-wise Convolution）

在传统卷积中，卷积核会对输入的每个通道进行操作，然后得到一个新的特征图，而这种操作往往涉及大量的计算，尤其是在输入数据和卷积核的通道数较多时，计算量会急剧增加。

深度可分卷（Depth-wise Convolution）将传统卷积操作分解成 两个阶段：
（1）逐通道卷积： 对于输入的每个通道，使用一个卷积核单独进行卷积操作。每个卷积核只处理一个通道，通常使用较小的卷积核（例如3x3）。
（2）逐点卷积： 在每个通道的输出特征图上使用一个1x1卷积核进行线性组合（即通过逐点卷积将每个通道的输出进行融合）。

2. 多尺度卷积核的引入

传统的卷积神经网络通常使用固定大小的卷积核（如3x3、5x5等）来提取特征。然而，单一尺寸的卷积核往往只能捕捉到固定尺度的特征，无法全面地捕捉图像中不同大小、不同尺度的物体特征。为了解决这个问题， 多尺度卷积核（Multi-scale Kernels） 被引入到 MSDC 中。

在 MSDC 中，通过引入多种尺度（尺寸不同）的卷积核来提取图像的多尺度特征。常见的做法是使用不同尺寸的卷积核并行处理输入特征图，比如：使用 $3 \times 3$ 的卷积核捕捉局部的细节信息。使用 $5 \times 5$ 或 $7 \times 7$ 的卷积核捕捉较大的上下文信息。这些不同尺度的卷积核能够捕捉到图像中的多种尺度特征（例如：小物体、大物体等），尤其是在图像中物体尺度变化较大的情况下，多尺度卷积能够帮助网络提高对不同尺度特征的感知能力。

3. 深度可分卷积与多尺度卷积的结合

将深度可分卷积与多尺度卷积结合，MSDC 在保持计算效率的同时，能够有效地捕捉图像中不同尺度的特征。具体而言，MSDC 使用多尺度卷积核（例如 $3 \times 3$ ， $5 \times 5$ ， $7 \times 7$ 等）分别对输入的特征图进行卷积操作，并通过深度可分卷积的方式对每个尺度的卷积核进行操作，最后将这些不同尺度的特征进行融合（例如通过拼接或加和的方式）。

这种设计方式的具体流程如下：
1. 逐通道卷积： 每个卷积核（不同尺度）只作用于输入的每一个通道，分别对不同尺度的特征进行处理。
2. 多尺度特征提取： 多个不同尺度的卷积核会在同一层次上并行工作，每个卷积核从不同的感受野范围内提取特征。
3. 特征融合： 通过连接（concatenation）或加和（summation）等方式，将来自不同尺度卷积核的输出特征进行融合，得到包含多尺度信息的特征图。

三、核心代码

import torch
import torch.nn as nn
from functools import partial
import math
from timm.models.layers import trunc_normal_tf_
from timm.models.helpers import named_apply__all__ = ['C3k2_MSCB1', 'C3k2_MSCB2']def gcd(a, b):while b:a, b = b, a % breturn a# Other types of layers can go here (e.g., nn.Linear, etc.)
def _init_weights(module, name, scheme=''):if isinstance(module, nn.Conv2d) or isinstance(module, nn.Conv3d):if scheme == 'normal':nn.init.normal_(module.weight, std=.02)if module.bias is not None:nn.init.zeros_(module.bias)elif scheme == 'trunc_normal':trunc_normal_tf_(module.weight, std=.02)if module.bias is not None:nn.init.zeros_(module.bias)elif scheme == 'xavier_normal':nn.init.xavier_normal_(module.weight)if module.bias is not None:nn.init.zeros_(module.bias)elif scheme == 'kaiming_normal':nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')if module.bias is not None:nn.init.zeros_(module.bias)else:# efficientnet likefan_out = module.kernel_size[0] * module.kernel_size[1] * module.out_channelsfan_out //= module.groupsnn.init.normal_(module.weight, 0, math.sqrt(2.0 / fan_out))if module.bias is not None:nn.init.zeros_(module.bias)elif isinstance(module, nn.BatchNorm2d) or isinstance(module, nn.BatchNorm3d):nn.init.constant_(module.weight, 1)nn.init.constant_(module.bias, 0)elif isinstance(module, nn.LayerNorm):nn.init.constant_(module.weight, 1)nn.init.constant_(module.bias, 0)def act_layer(act, inplace=False, neg_slope=0.2, n_prelu=1):# activation layeract = act.lower()if act == 'relu':layer = nn.ReLU(inplace)elif act == 'relu6':layer = nn.ReLU6(inplace)elif act == 'leakyrelu':layer = nn.LeakyReLU(neg_slope, inplace)elif act == 'prelu':layer = nn.PReLU(num_parameters=n_prelu, init=neg_slope)elif act == 'gelu':layer = nn.GELU()elif act == 'hswish':layer = nn.Hardswish(inplace)else:raise NotImplementedError('activation layer [%s] is not found' % act)return layerdef channel_shuffle(x, groups):batchsize, num_channels, height, width = x.data.size()channels_per_group = num_channels // groups# reshapex = x.view(batchsize, groups,channels_per_group, height, width)x = torch.transpose(x, 1, 2).contiguous()# flattenx = x.view(batchsize, -1, height, width)return x#   Multi-scale depth-wise convolution (MSDC)
class MSDC(nn.Module):def __init__(self, in_channels, kernel_sizes, stride, activation='relu6', dw_parallel=True):super(MSDC, self).__init__()self.in_channels = in_channelsself.kernel_sizes = kernel_sizesself.activation = activationself.dw_parallel = dw_parallelself.dwconvs = nn.ModuleList([nn.Sequential(nn.Conv2d(self.in_channels, self.in_channels, kernel_size, stride, kernel_size // 2,groups=self.in_channels, bias=False),nn.BatchNorm2d(self.in_channels),act_layer(self.activation, inplace=True))for kernel_size in self.kernel_sizes])self.init_weights('normal')def init_weights(self, scheme=''):named_apply(partial(_init_weights, scheme=scheme), self)def forward(self, x):# Apply the convolution layers in a loopoutputs = []for dwconv in self.dwconvs:dw_out = dwconv(x)outputs.append(dw_out)if self.dw_parallel == False:x = x + dw_out# You can return outputs based on what you intend to do with themreturn outputsclass MSCB(nn.Module):"""Multi-scale convolution block (MSCB)"""def __init__(self, in_channels, out_channels, shortcut=False, stride=1, kernel_sizes=[1, 3, 5], expansion_factor=2, dw_parallel=True, activation='relu6'):super(MSCB, self).__init__()add = shortcutself.in_channels = in_channelsself.out_channels = out_channelsself.stride = strideself.kernel_sizes = kernel_sizesself.expansion_factor = expansion_factorself.dw_parallel = dw_parallelself.add = addself.activation = activationself.n_scales = len(self.kernel_sizes)# check stride valueassert self.stride in [1, 2]# Skip connection if stride is 1self.use_skip_connection = True if self.stride == 1 else False# expansion factorself.ex_channels = int(self.in_channels * self.expansion_factor)self.pconv1 = nn.Sequential(# pointwise convolutionnn.Conv2d(self.in_channels, self.ex_channels, 1, 1, 0, bias=False),nn.BatchNorm2d(self.ex_channels),act_layer(self.activation, inplace=True))self.msdc = MSDC(self.ex_channels, self.kernel_sizes, self.stride, self.activation,dw_parallel=self.dw_parallel)if self.add == True:self.combined_channels = self.ex_channels * 1else:self.combined_channels = self.ex_channels * self.n_scalesself.pconv2 = nn.Sequential(# pointwise convolutionnn.Conv2d(self.combined_channels, self.out_channels, 1, 1, 0, bias=False),nn.BatchNorm2d(self.out_channels),)if self.use_skip_connection and (self.in_channels != self.out_channels):self.conv1x1 = nn.Conv2d(self.in_channels, self.out_channels, 1, 1, 0, bias=False)self.init_weights('normal')def init_weights(self, scheme=''):named_apply(partial(_init_weights, scheme=scheme), self)def forward(self, x):pout1 = self.pconv1(x)msdc_outs = self.msdc(pout1)if self.add == True:dout = 0for dwout in msdc_outs:dout = dout + dwoutelse:dout = torch.cat(msdc_outs, dim=1)dout = channel_shuffle(dout, gcd(self.combined_channels, self.out_channels))out = self.pconv2(dout)if self.use_skip_connection:if self.in_channels != self.out_channels:x = self.conv1x1(x)return x + outelse:return outdef autopad(k, p=None, d=1):  # kernel, padding, dilation"""Pad to 'same' shape outputs."""if d > 1:k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-sizeif p is None:p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-padreturn pclass Conv(nn.Module):"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""default_act = nn.SiLU()  # default activationdef __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):"""Initialize Conv layer with given arguments including activation."""super().__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()def forward(self, x):"""Apply convolution, batch normalization and activation to input tensor."""return self.act(self.bn(self.conv(x)))def forward_fuse(self, x):"""Perform transposed convolution of 2D data."""return self.act(self.conv(x))class Bottleneck(nn.Module):"""Standard bottleneck."""def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):"""Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""super().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, k[0], 1)self.cv2 = Conv(c_, c2, k[1], 1, g=g)self.add = shortcut and c1 == c2def forward(self, x):"""Applies the YOLO FPN to input data."""return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))class C2f(nn.Module):"""Faster Implementation of CSP Bottleneck with 2 convolutions."""def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):"""Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""super().__init__()self.c = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, 2 * self.c, 1, 1)self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))def forward(self, x):"""Forward pass through C2f layer."""y = list(self.cv1(x).chunk(2, 1))y.extend(m(y[-1]) for m in self.m)return self.cv2(torch.cat(y, 1))def forward_split(self, x):"""Forward pass using split() instead of chunk()."""y = list(self.cv1(x).split((self.c, self.c), 1))y.extend(m(y[-1]) for m in self.m)return self.cv2(torch.cat(y, 1))class C3(nn.Module):"""CSP Bottleneck with 3 convolutions."""def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):"""Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""super().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c1, c_, 1, 1)self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))def forward(self, x):"""Forward pass through the CSP bottleneck with 2 convolutions."""return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))class C3k(C3):"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):"""Initializes the C3k module with specified channels, number of layers, and configurations."""super().__init__(c1, c2, n, shortcut, g, e)c_ = int(c2 * e)  # hidden channels# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))class C3k_MSCB(C3):"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):"""Initializes the C3k module with specified channels, number of layers, and configurations."""super().__init__(c1, c2, n, shortcut, g, e)c_ = int(c2 * e)  # hidden channels# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))self.m = nn.Sequential(*(MSCB(c_, c_, shortcut) for _ in range(n)))class C3k2_MSCB1(C2f):"""Faster Implementation of CSP Bottleneck with 2 convolutions."""def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):"""Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""super().__init__(c1, c2, n, shortcut, g, e)self.m = nn.ModuleList(C3k(self.c, self.c, 2, shortcut, g) if c3k else MSCB(self.c, self.c, shortcut) for _ in range(n))# 解析 c3k在主干和网络最后一个C3k2的时候设置True走的是C3k, 否则我们走的是MSBlockclass C3k2_MSCB2(C2f):"""Faster Implementation of CSP Bottleneck with 2 convolutions."""def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):"""Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""super().__init__(c1, c2, n, shortcut, g, e)self.m = nn.ModuleList(C3k_MSCB(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n))if __name__ == "__main__":# Generating Sample imageimage_size = (1, 64, 240, 240)image = torch.rand(*image_size)image_size1 = (1, 64, 480, 480)image1 = torch.rand(*image_size1)# Modelmobilenet_v1 = MSCB(64, 64,)out = mobilenet_v1(image)print(out.size())

四、使用方式

4.1 修改一

第一还是建立文件，我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹，然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。

4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'，然后在其内部导入我们的检测头如下图所示。

4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块！