当前位置: 首页 > news >正文

whisper语音识别部署及WER评价

1.whisper部署

详细过程可以参照:🏠

创建项目文件夹

mkdir whisper
cd whisper

conda创建虚拟环境

conda create -n py310 python=3.10 -c conda-forge -y

安装pytorch

pip install --pre torch torchvision torchaudio --extra-index-url 

下载whisper

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

安装相关包

pip install tqdm
pip install numba
pip install tiktoken==0.3.3
brew install ffmpeg

测试一下whispet是否安装成功(默认识别为中文)

whisper test.wav --model small
#test.wav为自己的测试wav文件,map3也支持 small是指用小模型

whisper识别中文的时候经常会输出繁体,加入一下参数可以避免:

 whisper test.wav --model small --language zh --initial_prompt "以下是普通话的句子。"
#注意"以下是普通话的句子。"不能随便修改,只能是这句话才有效果。

2.脚本批量测试

创建test.sh脚本,输入一下内容,可以实现对某一文件夹下的wav文件逐个中文语音识别。

#!/bin/bash
for ((i=0;i<300;i++));dofile="wav/A13_${i}.wav"if [ ! -f "$file" ];thenbreakfiwhisper "$file" --model medium --output_dir denied --language zh --initial_prompt "以下是普通话的句子。"
done

 实现英文语音识别需要修改为:

#!/bin/bash
for ((i=0;i<300;i++));dofile="en/${i}.wav"if [ ! -f "$file" ];thenbreakfiwhisper "$file" --model small --output_dir denied --language en
done

3.对运行出来的结果进行评测

一般地,语音识别通常采用WER,即词错误率,评估语音识别和文本转换质量。

这里我们主要采用 github上的开源项目:🌟 编写的python-wer代码对结果进行评价。

其中,我们的正确样本形式为:

 whisper输出的预测结果形式为:

 因此要对文本进行处理(去空格、去标点符号)后进行wer评价,相关代码如下:

(可根据具体情况修改calculate_WER)

import sys
import numpydef editDistance(r, h):'''This function is to calculate the edit distance of reference sentence and the hypothesis sentence.Main algorithm used is dynamic programming.Attributes: r -> the list of words produced by splitting reference sentence.h -> the list of words produced by splitting hypothesis sentence.'''d = numpy.zeros((len(r)+1)*(len(h)+1), dtype=numpy.uint8).reshape((len(r)+1, len(h)+1))for i in range(len(r)+1):d[i][0] = ifor j in range(len(h)+1):d[0][j] = jfor i in range(1, len(r)+1):for j in range(1, len(h)+1):if r[i-1] == h[j-1]:d[i][j] = d[i-1][j-1]else:substitute = d[i-1][j-1] + 1insert = d[i][j-1] + 1delete = d[i-1][j] + 1d[i][j] = min(substitute, insert, delete)return ddef getStepList(r, h, d):'''This function is to get the list of steps in the process of dynamic programming.Attributes: r -> the list of words produced by splitting reference sentence.h -> the list of words produced by splitting hypothesis sentence.d -> the matrix built when calulating the editting distance of h and r.'''x = len(r)y = len(h)list = []while True:if x == 0 and y == 0: breakelif x >= 1 and y >= 1 and d[x][y] == d[x-1][y-1] and r[x-1] == h[y-1]: list.append("e")x = x - 1y = y - 1elif y >= 1 and d[x][y] == d[x][y-1]+1:list.append("i")x = xy = y - 1elif x >= 1 and y >= 1 and d[x][y] == d[x-1][y-1]+1:list.append("s")x = x - 1y = y - 1else:list.append("d")x = x - 1y = yreturn list[::-1]def alignedPrint(list, r, h, result):'''This funcition is to print the result of comparing reference and hypothesis sentences in an aligned way.Attributes:list   -> the list of steps.r      -> the list of words produced by splitting reference sentence.h      -> the list of words produced by splitting hypothesis sentence.result -> the rate calculated based on edit distance.'''print("REF:", end=" ")for i in range(len(list)):if list[i] == "i":count = 0for j in range(i):if list[j] == "d":count += 1index = i - countprint(" "*(len(h[index])), end=" ")elif list[i] == "s":count1 = 0for j in range(i):if list[j] == "i":count1 += 1index1 = i - count1count2 = 0for j in range(i):if list[j] == "d":count2 += 1index2 = i - count2if len(r[index1]) < len(h[index2]):print(r[index1] + " " * (len(h[index2])-len(r[index1])), end=" ")else:print(r[index1], end=" "),else:count = 0for j in range(i):if list[j] == "i":count += 1index = i - countprint(r[index], end=" "),print("\nHYP:", end=" ")for i in range(len(list)):if list[i] == "d":count = 0for j in range(i):if list[j] == "i":count += 1index = i - countprint(" " * (len(r[index])), end=" ")elif list[i] == "s":count1 = 0for j in range(i):if list[j] == "i":count1 += 1index1 = i - count1count2 = 0for j in range(i):if list[j] == "d":count2 += 1index2 = i - count2if len(r[index1]) > len(h[index2]):print(h[index2] + " " * (len(r[index1])-len(h[index2])), end=" ")else:print(h[index2], end=" ")else:count = 0for j in range(i):if list[j] == "d":count += 1index = i - countprint(h[index], end=" ")print("\nEVA:", end=" ")for i in range(len(list)):if list[i] == "d":count = 0for j in range(i):if list[j] == "i":count += 1index = i - countprint("D" + " " * (len(r[index])-1), end=" ")elif list[i] == "i":count = 0for j in range(i):if list[j] == "d":count += 1index = i - countprint("I" + " " * (len(h[index])-1), end=" ")elif list[i] == "s":count1 = 0for j in range(i):if list[j] == "i":count1 += 1index1 = i - count1count2 = 0for j in range(i):if list[j] == "d":count2 += 1index2 = i - count2if len(r[index1]) > len(h[index2]):print("S" + " " * (len(r[index1])-1), end=" ")else:print("S" + " " * (len(h[index2])-1), end=" ")else:count = 0for j in range(i):if list[j] == "i":count += 1index = i - countprint(" " * (len(r[index])), end=" ")print("\nWER: " + result)return resultdef wer(r, h):"""This is a function that calculate the word error rate in ASR.You can use it like this: wer("what is it".split(), "what is".split()) """# build the matrixd = editDistance(r, h)# find out the manipulation stepslist = getStepList(r, h, d)# print the result in aligned wayresult = float(d[len(r)][len(h)]) / len(r) * 100result = str("%.2f" % result) + "%"result=alignedPrint(list, r, h, result)return result# 计算总WER
def calculate_WER():with open("whisper_out.txt", "r") as f:text1_list = [i[11:].strip("\n") for i in f.readlines()]with open("A13.txt", "r") as f:text2_orgin_list = [i[11:].strip("\n") for i in f.readlines()]total_distance = 0total_length = 0WER=0symbols = ",@#¥%……&*()——+~!{}【】;‘:“”‘。?》《、"# calculate distance between each pair of textsfor i in range(len(text1_list)):match1 = re.search('[\u4e00-\u9fa5]', text1_list[i])if match1:index1 = match1.start()else:index1 = len(text1_list[i])match2 = re.search('[\u4e00-\u9fa5]', text2_orgin_list[i])if match2:index2 = match2.start()else:index2 = len( text2_orgin_list[i])result1=  text1_list[i][index1:]result1= result1.translate(str.maketrans('', '', symbols))result2=  text2_orgin_list[i][index2:]result2=result2.replace(" ", "")print(result1)print(result2)result=wer(result1,result2)WER+=float(result.strip('%')) / 100WER=WER/len(text1_list)print("总WER:", WER)print("总WER:", WER.__format__('0.2%'))
calculate_WER()

评价结果形如:

4.与paddlespeech的测试对比:

数据集

数据量

paddle

(中英文分开)

paddle

(同一模型)

whisper(small)

(同一模型)

whisper(medium)

(同一模型)

zhthchs30

(中文错字率)

250

11.61%

45.53%

24.11%

13.95%

LibriSpeech

(英文错字率)

125

7.76%

50.88%

9.31%

9.31%

5.测试所用数据集

自己处理过的开源wav数据

http://www.lryc.cn/news/145638.html

相关文章:

  • java太卷了,怎么办?
  • android多屏触摸相关的详解方案-安卓framework开发手机车载车机系统开发课程
  • 微信小程序 实时日志
  • Spring AOP基于注解方式实现和细节
  • CVPR2023论文及代码合集来啦~
  • 基于ETLCloud的自定义规则调用第三方jar包实现繁体中文转为简体中文
  • TDesign在按钮上加入图标组件
  • Linux 终端命令行 产品介绍
  • 计算机毕设 基于深度学习的植物识别算法 - cnn opencv python
  • 【STM32】学习笔记-江科大
  • Doris架构中包含哪些技术?
  • 《vue3实战》通过indexOf方法实现电影评价系统的模糊查询功能
  • java对时间序列每x秒进行分组
  • 八月更新 | CI 构建计划触发机制升级、制品扫描 SBOM 分析功能上线!
  • Spring核心配置步骤-完全基于XML的配置
  • 宏基官网下载的驱动怎么安装(宏基笔记本如何安装系统)
  • 基于AVR128单片机抢答器proteus仿真设计
  • openGauss学习笔记-54 openGauss 高级特性-MOT
  • InsCode AI 创作助手
  • java对时间序列根据阈值进行连续性分片
  • Pillow:Python的图像处理库(安装与使用教程)
  • 自然语言处理-NLP
  • 柠檬水找零【贪心算法-】
  • el-date-picker设置开始时间小于结束时间
  • Linux内核学习(十三)—— 设备与模块(基于Linux 2.6内核)
  • 计算机视觉工程师学习路线
  • c#多线程—基础概念到“双色球”项目实现(附知识点目录、代码、视频)
  • 【OpenCV入门】第一部分——图像处理基础
  • vue3+ts+tinynce富文本编辑器+htmlDocx+file-saver 配合实现word下载
  • 论文阅读 The Power of Tiling for Small Object Detection