当前位置: 首页 > news >正文

Pooling方法总结(语音识别)

Pooling layer将变长的frame-level features转换为一个定长的向量。

1. Statistics Pooling

链接:http://danielpovey.com/files/2017_interspeech_embeddings.pdf

The default pooling method for x-vector is statistics pooling.

The statistics pooling layer calculates the mean vector µ as well as the second-order statistics as the standard deviation vector σ over frame-level features ht (t = 1, · · · , T ).

2. Attentive Statistics Pooling

链接:https://arxiv.org/pdf/1803.10963.pdf

在一段话中,往往某些帧的帧级特征比其他帧的特征更为独特重要,因此使用attention赋予每帧feature不同的权值。

其中f(.)代表非线性变换,如tanh or ReLU function。

最后将每帧特征加劝求和

3. Self-Attentive pooling

链接:https://danielpovey.com/files/2018_interspeech_xvector_attention.pdf

4. Self Multi-Head Attention pooling

论文:Multi-Resolution Multi-Head Attention in Deep Speaker Embedding | IEEE Conference Publication | IEEE Xplore

5. NetVLAD

论文:

https://arxiv.org/pdf/1902.10107.pdf

https://arxiv.org/pdf/1511.07247.pdf

更详细的解释参考:从VLAD到NetVLAD,再到NeXtVlad - 知乎

6. Learnable Dictionary Encoding (LDE)

论文:https://arxiv.org/pdf/1804.05160.pdf

we introduce two groups of learnable parameters. One is the dictionary component center, noted as µ = {µ1, µ2 · · · µc}. The other one is assigned weights, noted as w.

where the smoothing factor  s_cfor each dictionary center u_cis learnable.

7. Attentive Bilinear Pooling (ABP) - Interspeech 2020

论文:https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1922.pdf

Let H \in \mathbb{R}^{L\times D} be the frame-level feature map captured by the hidden layer below the self-attention layer, where L and D are the number of frames and feature dimension respectively. Then the attention map A \in \mathbb{R}^{K\times L} can be obtained by feeding H into a 1×1 convolutional layer followed by softmax non-linear activation, where K is the number of attention heads. The 1st-order and 2nd-order attentive statistics of H, denoted by µ and \sigma ^{2} , can be computed similar as crosslayer bilinear pooling, which is

where T1(x) is the operation of reshaping x into a vector, and T2(x) includes a signed square-root step and a L2- normalization step.  The output of ABP is the concatenation of µ and \sigma ^{2}

8. Short-time Spectral Pooling (STSP) - ICASSP 2021

​​​​​​​​​​​​​​​​​​​​​​​​​​​​https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9414094&tag=1icon-default.png?t=N7T8https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9414094&tag=1From a Fourier perspective, statistics pooling only exploits the DC (zero-frequency) components in the spectral domain, whereas STSP incorporates more spectral components besides the DC ones during aggregation and is able to retain richer speaker information.

1. 将卷积层提取到的特征做STFT(Short Time Fourier Transorm),每一个channel得到一个二维频谱图。

2. 计算averaged spectral array

3. 计算second-order spectral statistics

4. 将两个特征进行拼接(C is the number of channels)

9. Multi-head attentive STSP (IEEE TRANS. ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2022)

One limitation of STSP is that the brute average of the spectrograms along the temporal axis ignores the importance of individual windowed segments when computing the spectral representations. In other words, all segments in a specific spectrogram were treated with equal importance.

http://www.lryc.cn/news/265252.html

相关文章:

  • Java可变参数(学习推荐版,通俗易懂)
  • 异步编程Promise
  • Centos上的默认文本编辑器vi的操作方法积累
  • 海康rtsp拉流,rtmp推流,nginx部署转flv集成
  • 【Python百宝箱】时序之美:Python 时间序列探秘与创新
  • flutter开发实战-第一帧布局完成回调实现
  • Windows11编译VTM源码生成Visual Studio 工程
  • [数据结构进阶 C++] 二叉搜索树(BinarySearchTree)的模拟实现
  • PostGIS学习教程十四:更多的空间连接
  • 【爬虫软件】孔夫子二手书采集
  • P8736 [蓝桥杯 2020 国 B] 游园安排
  • 初识Docker-什么是docker
  • maven的pom.xml设置本地仓库
  • Qt获取屏幕DPI缩放比
  • Spring MVC控制层框架
  • vmware安装银河麒麟V10高级服务器操作系统
  • 掌握Jenknis基础概念
  • AWS 知识二:AWS同一个VPC下的ubuntu实例通过ldapsearch命令查询目录用户信息
  • Ubuntu 常用命令之 fdisk 命令用法介绍
  • 论文中公式怎么降重 papergpt
  • 27. 过滤器
  • 做一个wiki页面是体验HTML语义的好方法
  • 金融CRM有用吗?金融行业CRM有哪些功能
  • @XmlAccessorType+@XmlElement完美解决Java类到XML映射问题
  • 软件渗透测试有哪些测试流程?权威安全测试报告的重要性
  • 安防视频融合云平台/智慧监控平台EasyCVR如何添加验证码调用接口?
  • 浏览器输入一个url,它的解析过程
  • 第29节: Vue3 列表渲染
  • CloudPulse:一款针对AWS云环境的SSL证书搜索与分析引擎
  • 【网络安全】学习Web安全必须知道的一本书