当前位置: 首页 > news >正文

【AudioClassificationModelZoo-Pytorch】基于Pytorch的声音事件检测分类系统

源码:https://github.com/Shybert-AI/AudioClassificationModelZoo-Pytorch

**加粗样式**
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

模型测试表

模型网络结构batch_sizeFLOPs(G)Params(M)特征提取方式数据集类别数量模型验证集性能
EcapaTdnn1280.486.1melUrbanSound8K10accuracy=0.974, precision=0.972
recall=0.967, F1-score=0.967
PANNS(CNN6)1280.984.57melUrbanSound8K10accuracy=0.971, precision=0.963
recall=0.954, F1-score=0.955
TDNN1280.212.60melUrbanSound8K10accuracy=0.968, precision=0.964
recall=0.959, F1-score=0.958
PANNS(CNN14)1281.9879.7melUrbanSound8K10accuracy=0.966, precision=0.956
recall=0.957, F1-score=0.952
PANNS(CNN10)1281.294.96melUrbanSound8K10accuracy=0.964, precision=0.955
recall=0.955, F1-score=0.95
DTFAT(MaxAST)168.3268.32melUrbanSound8K10accuracy=0.963, precision=0.939
recall=0.935, F1-score=0.933
EAT-M-Transformer1280.161.59melUrbanSound8K10accuracy=0.935, precision=0.905
recall=0.907, F1-score=0.9
AST165.2885.26melUrbanSound8K10accuracy=0.932, precision=0.893
recall=0.887, F1-score=0.884
TDNN_GRU_SE2560.263.02melUrbanSound8K10accuracy=0.929, precision=0.916
recall=0.907, F1-score=0.904
mn10_as1280.034.21melUrbanSound8K10accuracy=0.912, precision=0.88
recall=0.894, F1-score=0.878
dymn10_as1280.014.76melUrbanSound8K10accuracy=0.904, precision=0.886
recall=0.883, F1-score=0.872
ERes2NetV21280.875.07melUrbanSound8K10accuracy=0.874, precision=0.828
recall=0.832, F1-score=0.818
ResNetSE_GRU1281.8410.31melUrbanSound8K10accuracy=0.865, precision=0.824
recall=0.827, F1-score=0.813
ResNetSE1281.517.15melUrbanSound8K10accuracy=0.859, precision=0.82
recall=0.819, F1-score=0.807
CAMPPlus1280.477.30melUrbanSound8K10accuracy=0.842, precision=0.793
recall=0.788, F1-score=0.778
HTS-AT165.7027.59melUrbanSound8K10accuracy=0.84, precision=0.802
recall=0.796, F1-score=0.795
EffilecentNet_B21287.73melUrbanSound8K10accuracy=0.779, precision=0.718
recall=0.741, F1-score=0.712
ERes2Net1281.396.22melUrbanSound8K10accuracy=0.778, precision=0.808
recall=0.787, F1-score=0.779
Res2Net1280.045.09melUrbanSound8K10accuracy=0.723, precision=0.669
recall=0.672, F1-score=0.648
MobileNetV41280.032.51melUrbanSound8K10accuracy=0.608, precision=0.553
recall=0.549, F1-score=0.523

说明:

  使用的测试集为从数据集中每10条音频取一条,共874条。

5.准备数据

  生成数据集的list,label_list.txt,train_list.txt,test_list.txt
执行create_data.py即可生成数据列表,里面提供了生成多种数据集列表方式,具体看代码。

python create_data.py

  生成的列表是长这样的,前面是音频的路径,后面是该音频对应的标签,从0开始,路径和标签之间用\t隔开。

dataset/UrbanSound8K/audio/fold2/104817-4-0-2.wav	4
dataset/UrbanSound8K/audio/fold9/105029-7-2-5.wav	7
dataset/UrbanSound8K/audio/fold3/107228-5-0-0.wav	5
dataset/UrbanSound8K/audio/fold4/109711-3-2-4.wav	3

5.特征提取(可选,如果进行特征提取,训练耗时提升36倍),已提取的特征文件和已训练的模型文件下载。模型放到model目录下,特征放到features目录下。

链接: https://pan.baidu.com/s/15ziJovO3t41Nqgqtmovuew 提取码: 8a59

python extract_feature.py

6.训练,可以通过指定–model_type的参数来指定模型,进行模型训练。

如:EcapaTdnn、PANNS(CNN6)、TDNN、PANNS(CNN14)、PANNS(CNN10)、DTFAT(MaxAST)、EAT-M-Transformer、AST、TDNN_GRU_SE、mn10_as、dymn10_as、ERes2NetV2、ResNetSE_GRU、ResNetSE、CAMPPlus、HTS-AT、EffilecentNet_B2、ERes2Net、Res2Net、MobileNetV4

python train.py --model_type EAT-M-Transformer

  在线提取特征训练的日志为:

Epoch: 10
Train: 100%|██████████| 62/62 [07:28<00:00,  7.23s/it, BCELoss=0.931, accuracy=0.502, precision=0.563, recall=0.508, F1-score=0.505]
Valid: 100%|██████████| 14/14 [00:53<00:00,  3.82s/it, BCELoss=1.19, accuracy=0.425, precision=0.43, recall=0.393, F1-score=0.362]Epoch: 11
Train: 100%|██████████| 62/62 [07:23<00:00,  7.16s/it, BCELoss=2.17, accuracy=0.377, precision=0.472, recall=0.386, F1-score=0.375]
Valid: 100%|██████████| 14/14 [00:48<00:00,  3.47s/it, BCELoss=2.7, accuracy=0.362, precision=0.341, recall=0.328, F1-score=0.295]Epoch: 12
Train: 100%|██████████| 62/62 [07:20<00:00,  7.11s/it, BCELoss=1.8, accuracy=0.297, precision=0.375, recall=0.308, F1-score=0.274]
Valid: 100%|██████████| 14/14 [00:48<00:00,  3.47s/it, BCELoss=1.08, accuracy=0.287, precision=0.317, recall=0.285, F1-score=0.234]

  离线提取特征训练的日志为:

Epoch: 1
Train: 100%|██████████| 62/62 [00:12<00:00,  4.77it/s, BCELoss=8.25, accuracy=0.0935, precision=0.0982, recall=0.0878, F1-score=0.0741]
Valid: 100%|██████████| 14/14 [00:00<00:00, 29.53it/s, BCELoss=5.98, accuracy=0.142, precision=0.108, recall=0.129, F1-score=0.0909]
Model saved in the folder :  model
Model name is :  SAR_Pesudo_ResNetSE_s0_BCELossEpoch: 2
Train: 100%|██████████| 62/62 [00:12<00:00,  4.93it/s, BCELoss=7.71, accuracy=0.117, precision=0.144, recall=0.113, F1-score=0.0995]
Valid: 100%|██████████| 14/14 [00:00<00:00, 34.54it/s, BCELoss=8.15, accuracy=0.141, precision=0.0811, recall=0.133, F1-score=0.0785]

7.测试

  测试采用流式测试的方式,即每次送入模型2秒的音频数据,将音频数据转为[1,1,64,100]维度的张量数据,然后送入到模型中进行推理,每次都很得到推理的结构,可以根据阈值来判断该事件是否发生。

python model_test.py --model_type EAT-M-Transformer
http://www.lryc.cn/news/530880.html

相关文章:

  • 一文讲解Java中的ArrayList和LinkedList
  • CNN的各种知识点(五):平均精度均值(mean Average Precision, mAP)
  • 【优先算法】专题——前缀和
  • gitea - fatal: Authentication failed
  • 基于Spring Security 6的OAuth2 系列之八 - 授权服务器--Spring Authrization Server的基本原理
  • 蓝桥与力扣刷题(234 回文链表)
  • Google C++ Style / 谷歌C++开源风格
  • Windows图形界面(GUI)-QT-C/C++ - QT Tab Widget
  • 【大数据技术】教程05:本机DataGrip远程连接虚拟机MySQL/Hive
  • C++:结构体和类
  • MATLAB的数据类型和各类数据类型转化示例
  • UE求职Demo开发日志#19 给物品找图标,实现装备增加属性,背包栏UI显示装备
  • C++泛型编程指南09 类模板实现和使用友元
  • 使用MATLAB进行雷达数据采集可视化
  • 【Elasticsearch】allow_no_indices
  • 54【ip+端口+根目录通信】
  • python算法和数据结构刷题[3]:哈希表、滑动窗口、双指针、回溯算法、贪心算法
  • DeepSeek横空出世,AI格局或将改写?
  • 聚簇索引、哈希索引、覆盖索引、索引分类、最左前缀原则、判断索引使用情况、索引失效条件、优化查询性能
  • OpenAI 实战进阶教程 - 第四节: 结合 Web 服务:构建 Flask API 网关
  • python的pre-commit库的使用
  • 架构技能(四):需求分析
  • Linux环境下的Java项目部署技巧:安装 Nginx
  • 前端 Vue 性能提升策略
  • 深入理解linux中的文件(上)
  • Unity特效插件GodFX
  • 从 C 到 C++:理解结构体中字符串的存储与操作
  • Linux进阶——时间服务器
  • 力扣 295. 数据流的中位数
  • 【Linux】进程状态和优先级