当前位置：首页 > news >正文

YOLOv8涨点技巧：手把手教程，注意力机制如何在不同数据集上实现涨点的工作，内涵多种网络改进方法

news 2025/8/27 17:15:23

💡💡💡本文独家改进：手把手教程，解决注意力机制引入到YOLOv8在自己数据集不涨点的问题点，本文提供五种改进方法来解决此问题；

ContextAggregation | 亲测在血细胞检测项目中涨点，提供五种改进方法，最大 map@0.5 从原始0.895提升至0.916

1.验证数据集

1.1 血细胞检测介绍

数据来源于医疗相关数据集，目的是解决血细胞检测问题。任务是通过显微图像读数来检测每张图像中的所有红细胞（RBC）、白细胞（WBC）以及血小板 (Platelets)共三类

意义：选择该数据集的原因是我们血液中RBC、WBC和血小板的密度提供了大量关于免疫系统和血红蛋白的信息，这些信息可以帮助我们初步地识别一个人是否健康，如果在其血液中发现了任何差异，我们就可以迅速采取行动来进行下一步的诊断。然而通过显微镜手动查看样品是一个繁琐的过程，这也是深度学习模式能够发挥重要作用的地方，YOLOv8可以从显微图像中分类和检测血细胞，并且达到很高的精确度。

1.2 血细胞检测数据集介绍

数据集大小：364张

检测难点：1）类别不平衡；2）同个类别相互遮挡、不同类别相互遮挡；3）检测物长宽差异较大；等

2.Context Aggregation注意力举例分析

2.1 Context Aggregation介绍

论文：https://arxiv.org/abs/2106.01401

摘要
卷积神经网络(CNNs)在计算机视觉中无处不在，具有无数有效和高效的变化。最近，Container——最初是在自然语言处理中引入的——已经越来越多地应用于计算机视觉。早期的用户继续使用CNN的骨干，最新的网络是端到端无CNN的Transformer解决方案。最近一个令人惊讶的发现表明，一个简单的基于MLP的解决方案，没有任何传统的卷积或Transformer组件，可以产生有效的视觉表示。虽然CNN、Transformer和MLP-Mixers可以被视为完全不同的架构，但我们提供了一个统一的视图，表明它们实际上是在神经网络堆栈中聚合空间上下文的更通用方法的特殊情况。我们提出了Container(上下文聚合网络)，一个用于多头上下文聚合的通用构建块，它可以利用Container的长期交互作用，同时仍然利用局部卷积操作的诱导偏差，导致更快的收敛速度，这经常在CNN中看到。我们的Container架构在ImageNet上使用22M参数实现了82.7%的Top-1精度，比DeiT-Small提高了2.8，并且可以在短短200个时代收敛到79.9%的Top-1精度。比起相比的基于Transformer的方法不能很好地扩展到下游任务依赖较大的输入图像的分辨率,我们高效的网络,名叫CONTAINER-LIGHT,可以使用在目标检测和分割网络如DETR实例,RetinaNet和Mask-RCNN获得令人印象深刻的检测图38.9,43.8,45.1和掩码mAP为41.3，与具有可比较的计算和参数大小的ResNet-50骨干相比，分别提供了6.6、7.3、6.9和6.6 pts的较大改进。与DINO框架下的DeiT相比，我们的方法在自监督学习方面也取得了很好的效果。

仅需22M参数量，所提CONTAINER在ImageNet数据集取得了82.7%的的top1精度，以2.8%优于DeiT-Small；此外仅需200epoch即可达到79.9%的top1精度。不用于难以扩展到下游任务的Transformer方案(因为需要更高分辨率)，该方案CONTAINER-LIGHT可以嵌入到DETR、RetinaNet以及Mask-RCNN等架构中用于目标检测、实例分割任务并分别取得了6.6，7.6，6.9指标提升。

提供了一个统一视角表明：它们均是更广义方案下通过神经网络集成空间上下文信息的特例。我们提出了CONTAINER(CONText AggregatIon NEtwoRK)，一种用于多头上下文集成（Context Aggregation）的广义构建模块。

本文有以下几点贡献：

提出了关于主流视觉架构的一个统一视角；
提出了一种新颖的模块CONTAINER，它通过可学习参数和响应的架构混合使用了静态与动态关联矩阵(Affinity Matrix)，在图像分类任务中表现出了很强的结果；
提出了一种高效&有效的扩展CONTAINER-LIGHT在检测与分割方面取得了显著的性能提升。

代码详见：Yolov8涨点神器：用于微小目标检测的上下文增强和特征细化网络ContextAggregation，助力小目标检测，暴力涨点-CSDN博客

3.多种网络结构进行验证

结果分析

3.1 YOLOv8_ContextAggregation1.yaml

# Ultralytics YOLO 🚀, GPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPss: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPsm: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPsl: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2- [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, SPPF, [1024, 5]]  # 9# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 6], 1, Concat, [1]]  # cat backbone P4- [-1, 3, C2f, [512]]  # 12- [-1, 1, ContextAggregation, [512]]  # 13- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 4], 1, Concat, [1]]  # cat backbone P3- [-1, 3, C2f, [256]]  # 16 (P3/8-small)- [-1, 1, ContextAggregation, [256]]  # 17 (P5/32-large)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 13], 1, Concat, [1]]  # cat head P4- [-1, 3, C2f, [512]]  # 20 (P4/16-medium)- [-1, 1, ContextAggregation, [512]]  # 21 (P5/32-large)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 9], 1, Concat, [1]]  # cat head P5- [-1, 3, C2f, [1024]]  # 24 (P5/32-large)- [-1, 1, ContextAggregation, [1024]]  # 25 (P5/32-large)- [[17, 21, 25], 1, Detect, [nc]]  # Detect(P3, P4, P5)

map@0.5 从原始0.895提升至0.897

YOLOv8_ContextAggregation1 summary (fused): 204 layers, 3009125 parameters, 0 gradients, 8.1 GFLOPsClass     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 6/6 [00:04<00:00,  1.36it/s]all         87       1138      0.816      0.893      0.897      0.602WBC         87         87      0.971      0.989      0.985      0.771RBC         87        968      0.699      0.836      0.841      0.584Platelets         87         83      0.777      0.855      0.865      0.452

3.2 YOLOv8_ContextAggregation2.yaml

# Ultralytics YOLO 🚀, GPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPss: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPsm: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPsl: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2- [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, SPPF, [1024, 5]]  # 9# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 6], 1, Concat, [1]]  # cat backbone P4- [-1, 3, C2f, [512]]  # 12- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 4], 1, Concat, [1]]  # cat backbone P3- [-1, 3, C2f, [256]]  # 15 (P3/8-small)- [-1, 1, ContextAggregation, [256]]  # 16 (P5/32-large)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 12], 1, Concat, [1]]  # cat head P4- [-1, 3, C2f, [512]]  # 19 (P4/16-medium)- [-1, 1, ContextAggregation, [512]]  # 20 (P5/32-large)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 9], 1, Concat, [1]]  # cat head P5- [-1, 3, C2f, [1024]]  # 23 (P5/32-large)- [-1, 1, ContextAggregation, [1024]]  # 24 (P5/32-large)- [[16, 20, 24], 1, Detect, [nc]]  # Detect(P3, P4, P5)

map@0.5 从原始0.895提升至0.907

YOLOv8_ContextAggregation2 summary (fused): 195 layers, 3008482 parameters, 0 gradients, 8.1 GFLOPsClass     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:03<00:00,  1.59s/it]all         87       1138      0.824      0.892      0.907      0.613WBC         87         87      0.984          1      0.988      0.785RBC         87        968      0.727      0.836      0.851      0.596Platelets         87         83       0.76       0.84      0.881      0.457

3.3 YOLOv8_ContextAggregation3.yaml

# Ultralytics YOLO 🚀, GPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPss: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPsm: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPsl: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2- [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, SPPF, [1024, 5]]  # 9- [-1, 1, ContextAggregation, [1024]]  # 10# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 6], 1, Concat, [1]]  # cat backbone P4- [-1, 3, C2f, [512]]  # 13- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 4], 1, Concat, [1]]  # cat backbone P3- [-1, 3, C2f, [256]]  # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 13], 1, Concat, [1]]  # cat head P4- [-1, 3, C2f, [512]]  # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 10], 1, Concat, [1]]  # cat head P5- [-1, 3, C2f, [1024]]  # 22 (P5/32-large)- [[16, 19, 22], 1, Detect, [nc]]  # Detect(P3, P4, P5)

map@0.5 从原始0.895提升至0.904

YOLOv8_ContextAggregation3 summary (fused): 177 layers, 3007516 parameters, 0 gradients, 8.1 GFLOPsClass     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:03<00:00,  1.91s/it]all         87       1138      0.835      0.874      0.904       0.61WBC         87         87      0.979      0.989      0.993      0.779RBC         87        968      0.722      0.841       0.86      0.597Platelets         87         83      0.804      0.792       0.86      0.453

3.4 YOLOv8_ContextAggregation4.yaml

# Ultralytics YOLO 🚀, GPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPss: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPsm: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPsl: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2- [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, ContextAggregation, [128]]  # 3- [-1, 1, Conv, [256, 3, 2]]  # 4-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, ContextAggregation, [256]]  # 6- [-1, 1, Conv, [512, 3, 2]]  # 7-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, ContextAggregation, [512]]  # 9- [-1, 1, Conv, [1024, 3, 2]]  # 10-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, ContextAggregation, [1024]]  # 12- [-1, 1, SPPF, [1024, 5]]  # 13# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 9], 1, Concat, [1]]  # cat backbone P4- [-1, 3, C2f, [512]]  # 16- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 5], 1, Concat, [1]]  # cat backbone P3- [-1, 3, C2f, [256]]  # 19 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 16], 1, Concat, [1]]  # cat head P4- [-1, 3, C2f, [512]]  # 22 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 13], 1, Concat, [1]]  # cat head P5- [-1, 3, C2f, [1024]]  # 25 (P5/32-large)- [[19, 22, 25], 1, Detect, [nc]]  # Detect(P3, P4, P5)

map@0.5 从原始0.895提升至0.896

YOLOv8_ContextAggregation4 summary (fused): 204 layers, 3008645 parameters, 0 gradients, 8.1 GFLOPsClass     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 6/6 [00:04<00:00,  1.37it/s]all         87       1138      0.829      0.884      0.896      0.609WBC         87         87      0.988          1       0.99      0.796RBC         87        968      0.741      0.796      0.843      0.581Platelets         87         83      0.759      0.855      0.854      0.451

3.5 YOLOv8_ContextAggregation5.yaml

# Ultralytics YOLO 🚀, GPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPss: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPsm: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPsl: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2- [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, ContextAggregation, [256]]  # 5- [-1, 1, Conv, [512, 3, 2]]  # 6-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, ContextAggregation, [512]]  # 8- [-1, 1, Conv, [1024, 3, 2]]  # 9-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, ContextAggregation, [1024]]  # 11- [-1, 1, SPPF, [1024, 5]]  # 12# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 8], 1, Concat, [1]]  # cat backbone P4- [-1, 3, C2f, [512]]  # 15- [-1, 1, nn.Upsample, [None, 2, 'nearest']]- [[-1, 4], 1, Concat, [1]]  # cat backbone P3- [-1, 3, C2f, [256]]  # 18 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 15], 1, Concat, [1]]  # cat head P4- [-1, 3, C2f, [512]]  # 21 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 12], 1, Concat, [1]]  # cat head P5- [-1, 3, C2f, [1024]]  # 24 (P5/32-large)- [[18, 21, 24], 1, Detect, [nc]]  # Detect(P3, P4, P5)

map@0.5 从原始0.895提升至0.916

YOLOv8_ContextAggregation5 summary (fused): 195 layers, 3008482 parameters, 0 gradients, 8.1 GFLOPs
8.0920064Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:03<00:00,  1.74s/it]all         87       1138      0.837      0.912      0.916      0.622WBC         87         87      0.971          1       0.99      0.791RBC         87        968      0.737      0.851      0.862      0.607Platelets         87         83      0.803      0.887      0.897      0.469