当前位置: 首页 > news >正文

微软云语音识别ASR示例Demo

对象存储服务 OSS 对应    Azure Blob Storage

语音识别 ASR 对应   Azure Speech-to-Text

语音合成 TTS 对应   Azure Text-to-Speech

上传..mp3文件或者上传OSS地址  返回音频的文字示例demo

依赖

<dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-webflux</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><!--   microsoft ASR     --><dependency><groupId>com.microsoft.cognitiveservices.speech</groupId><artifactId>client-sdk</artifactId><version>1.43.0</version></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><dependency><groupId>io.projectreactor</groupId><artifactId>reactor-test</artifactId><scope>test</scope></dependency></dependencies>

代码    在application.properties或者yaml中配置key和endpoint

package com.example.microsoftasr.controller;import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.AudioConfig;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;import java.io.File;
import java.net.URI;
import java.nio.file.Files;@RestController
@RequestMapping("/asr")
public class TestController {@Value("${azure.speech.key}")private String speechKey;@Value("${azure.speech.endpoint}")private String speechEndpoint;@GetMapping("/hello")public String test() {return "Hello World";}@PostMapping("/recognize")public String recognize(@RequestParam(value = "file", required = false) MultipartFile file,@RequestParam(value = "url", required = false) String ossUrl) {if ((file == null || file.isEmpty()) && (ossUrl == null || ossUrl.isBlank())) {return "未提供音频文件或音频地址";}File tempInput = null;File tempWav = null;try {// 1. 保存临时原始音频if (file != null && !file.isEmpty()) {String suffix = getSuffix(file.getOriginalFilename());tempInput = File.createTempFile("audio-input-", "." + suffix);file.transferTo(tempInput);} else {String suffix = getSuffix(ossUrl);tempInput = File.createTempFile("audio-input-", "." + suffix);try (var in = new java.net.URL(ossUrl).openStream()) {Files.copy(in, tempInput.toPath(), java.nio.file.StandardCopyOption.REPLACE_EXISTING);}}// 2. 转换成 WAV(16kHz 单声道)tempWav = File.createTempFile("audio-output-", ".wav");if (!getSuffix(tempInput.getName()).equalsIgnoreCase("wav")) {ProcessBuilder pb = new ProcessBuilder("F:\\ffmpeg-7.1.1-full_build\\ffmpeg-7.1.1-full_build\\bin\\ffmpeg.exe", "-y","-i", tempInput.getAbsolutePath(),"-ar", "16000","-ac", "1",tempWav.getAbsolutePath());Process process = pb.inheritIO().start();int exitCode = process.waitFor();if (exitCode != 0) return "ffmpeg 转换失败,exitCode=" + exitCode;} else {Files.copy(tempInput.toPath(), tempWav.toPath(), java.nio.file.StandardCopyOption.REPLACE_EXISTING);}// 3. 调用微软 ASR 识别SpeechConfig speechConfig = SpeechConfig.fromEndpoint(new URI(speechEndpoint), speechKey);speechConfig.setSpeechRecognitionLanguage("zh-CN");try (AudioConfig audioConfig = AudioConfig.fromWavFileInput(tempWav.getAbsolutePath());SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig)) {SpeechRecognitionResult result = recognizer.recognizeOnceAsync().get();if (result.getReason() == ResultReason.RecognizedSpeech) {return result.getText();} else {return "识别失败: " + result.getReason();}}} catch (Exception e) {e.printStackTrace();return "识别异常: " + e.getMessage();} finally {try {if (tempInput != null) Files.deleteIfExists(tempInput.toPath());if (tempWav != null) Files.deleteIfExists(tempWav.toPath());} catch (Exception ex) {ex.printStackTrace();}}}private String getSuffix(String filenameOrUrl) {if (filenameOrUrl == null || !filenameOrUrl.contains(".")) return "tmp";return filenameOrUrl.substring(filenameOrUrl.lastIndexOf('.') + 1);}}

http://www.lryc.cn/news/583972.html

相关文章:

  • 论文笔记(LLM distillation):Distilling Step-by-Step!
  • Flutter跨平台开发全解析
  • libimagequant 在 mac 平台编译双架构
  • 2025.07.09华为机考真题解析-第一题100分
  • SPGAN: Siamese projection Generative Adversarial Networks
  • 如何发现 Redis 中的 BigKey?
  • 速盾:高防CDN开发中的常见问题?
  • CANFD记录仪设备在无人驾驶快递车的应用
  • 数据同步平台部署指南
  • WebUI自动化知识点总结-基于Java语言
  • 解锁医疗新视界:医患共决策时间轴AI可视化工具
  • Flutter多线程机制深度解析
  • 【计算机基础理论知识】C++篇(二)
  • 利萨如图形详解:原理与Python动态绘制
  • 配置双网卡Linux主机作为路由器(连接NAT网络和仅主机模式网络)
  • pharokka phold--快速噬菌体注释工具
  • FeatherScan v4.0 – 适用于Linux的全自动内网信息收集工具
  • 基于大数据的电力系统故障诊断技术研究
  • LINUX710 MYSQL
  • 大数据学习6:Sqoop数据迁移工具
  • linux-用户与用户组管理
  • 时序数据库InfluxDB
  • 初学者对编译和链接的学习笔记(含预编译详解)
  • 量子计算能为我们做什么?
  • Linux之Tomcat WEB核心摘要
  • Unity3D iOS闪退问题解决方案
  • C++交叉编译工具链制作以及QT交叉编译环境配置
  • 安全访问云端内部应用:用frp的stcp功能解决SSH转发的痛点
  • 探索 Google NotebookLM:AI 驱动的效率提升新利器
  • MySQL事务实现原理