Chapter5:机器人感知
ROS1{\rm ROS1}ROS1的基础及应用,基于古月的课,各位可以去看,基于hawkbot{\rm hawkbot}hawkbot机器人进行实际操作。
ROS{\rm ROS}ROS版本:ROS1{\rm ROS1}ROS1的Melodic{\rm Melodic}Melodic;实际机器人:Hawkbot{\rm Hawkbot}Hawkbot;
1.机器视觉
1.1 ROS中的图像数据(二维图像)
# 1.安装相应的功能包
cd ~/willard_ws/src
git clone https://github.com/bosch-ros-pkg/usb_cam.git usb_camcd ..
catkin_make# 2.启动.launch文件
roslaunch usb_cam usb_cam-test.launch# 3.查看话题列表
rostopic list# 输出
==========================================================
/image_view/output
/image_view/parameter_descriptions
/image_view/parameter_updates
/rosout
/rosout_agg
/usb_cam/camera_info
/usb_cam/image_raw
/usb_cam/image_raw/compressed
/usb_cam/image_raw/compressed/parameter_descriptions
/usb_cam/image_raw/compressed/parameter_updates
/usb_cam/image_raw/compressedDepth
/usb_cam/image_raw/compressedDepth/parameter_descriptions
/usb_cam/image_raw/compressedDepth/parameter_updates
/usb_cam/image_raw/theora
/usb_cam/image_raw/theora/parameter_descriptions
/usb_cam/image_raw/theora/parameter_updates
==========================================================# 4.查看图像类型
rostopic info /usb_cam/image_raw==========================================================
Type: sensor_msgs/ImagePublishers: * /usb_cam (http://192.168.66.102:37379/)Subscribers: * /image_view (http://192.168.66.102:43915/)
==========================================================# 5.查看图像消息
rosmsg show sensor_msgs/Image ==========================================================
std_msgs/Header headeruint32 seqtime stampstring frame_id
uint32 height
uint32 width
string encoding
uint8 is_bigendian
uint32 step
uint8[] data
==========================================================Header:消息头,包含消息序号,时间戳,绑定坐标系;
height:图像的纵向分辨率;
width:图像的横向分辨率;
encoding:图像的编码格式,包含RGB、YUV等常用格式,不涉及图像压缩编码;
is_bigendian:图像数据的大小端存储模式;
step:一行图像数据的字节数量,作为数据的步长参数;
data:存储图像数据的数组,大小为:step*height个字节;# 6.查看图像消息
rosmsg show sensor_msgs/CompressedImage ===========================================
std_msgs/Header headeruint32 seqtime stampstring frame_id
string format
uint8[] data
===========================================format:图像的压缩编码格式(jpeg、png、bmp);
data:存储图像数据数组;
1.2 ROS中的图像数据(三维图像)
# 1.安装freenect
sudo apt-get install ros-melodic-freenect-*# 2.安装底层驱动
git clone https://github.com/avin2/SensorKinect.gitcd SensorKinect/Bin/
tar xvf SensorKinect093-Bin-Linux-x64-v5.1.2.1.tar.bz2
cd Sensor-Bin-Linux-x64-v5.1.2.1/
sudo ./install.sh# 3.在mbot_description/launch创建freenect.launch文件
# freenect.launch文件内容见下一代码块
touch freenect.launch# 4.启动.launch文件
roslaunch freenect_launch freenect.launch
rostopic info /camera/depth_registered/points# 5.查看点云消息
rosmsg show sensor_msgs/PointCloud2===========================================
height:点云图像的纵向分辨率;
width:点云图像的横向分辨率;
fields:每个点的数据类型;
is_bigendian:数据的大小端存储模式;
point_step:单点的数据字节步长;
row_step:一列数据的字节步长;
data:点云数据的存储数组,总字节大小为:row_step*height;
is_dense:是否有无效点;# 5.启动rviz
rosrun rviz rviz
# freenect.launch文件内容
<launch><!-- 启动freenect驱动 --><include file="$(find freenect_launch)/launch/freenect.launch"><arg name="publish_tf" value="false" /> <!-- use device registration --><arg name="depth_registration" value="true" /> <arg name="rgb_processing" value="true" /><arg name="ir_processing" value="false" /><arg name="depth_processing" value="false" /><arg name="depth_registered_processing" value="true" /><arg name="disparity_processing" value="false" /><arg name="disparity_registered_processing" value="false" /><arg name="sw_registered_processing" value="false" /><arg name="hw_registered_processing" value="true" /></include></launch>
1.3 摄像头标定
摄像头对光学器件的要求较高,由于摄像头内部与外部的一些原因,生成的物体图像往往会发生畸变,为避免数据源造成的误差,需要针对摄像头的参数进行标定;
# 1.安装标定功能包
sudo apt-get install ros-melodic-camera-calibration# 2.启动摄像头
roslaunch robot_vision usb_cam.launch# 3.启动标定包
# 注:
# 实际标定最好把棋盘格打印出来,该处笔者直接使用平板进行标定是不规范的;
rosrun camera_calibration cameracalibrator.py --size 8x6 --square 0.024 image:=/usb_cam/image_raw camera:=/usb_cam=========================
size:标定棋盘格的内部角点个数;
square:对应每个棋盘格的边长,单位:米;
image、camera:设置摄像头发布的图像话题;# 4.X、Y、Size、Skew均变绿色,点击CALIBRATE按钮;
# 带SAVE按钮变绿色后,点击SAVE,即可进行保存;
# 标定结果保存于:/tmp/calibrationdata.tar.gz# 5.使用标定文件
# 把/tmp/calibrationdata.tar.gz下的.yaml文件拷贝到功能包下
# 在usb_cam_with_calibration.launch文件添加.yaml文件即可
# usb_cam_with_calibration.launch文件内容
<launch><node name="usb_cam" pkg="usb_cam" type="usb_cam_node" output="screen" ><param name="video_device" value="/dev/video0" /><param name="image_width" value="1280" /><param name="image_height" value="720" /><param name="pixel_format" value="yuyv" /><param name="camera_frame_id" value="usb_cam" /><param name="io_method" value="mmap"/><param name="camera_info_url" type="string" value="file://$(find robot_vision)/camera_calibration.yaml" /></node></launch>
标定过程中的四个参数含义:
- X{\rm X}X:标定靶在摄像头视野中的左右移动;
- Y{\rm Y}Y:标定靶在摄像头视野中的上下移动;
- Size{\rm Size}Size:标定靶在摄像头视野中的前后移动;
- Skew{\rm Skew}Skew:标定靶在摄像头视野中的倾斜转动;
1.4 ROS+OpenCV
1.4.1 OpenCV例程测试
OpenCV{\rm OpenCV}OpenCV概述:
- OpenSourceComputerVisionLibrary{\rm Open\ Source\ Computer\ Vision\ Library}Open Source Computer Vision Library;
- 基于BSD{\rm BSD}BSD许可发行的跨平台开源计算机视觉库(Linux、Windows{\rm Linux、Windows}Linux、Windows和MacOS{\rm Mac\ OS}Mac OS等);
- 由一系列C{\rm C}C函数和少量C{\rm C}C++类构成,同时提供C{\rm C}C++、Python、Ruby{\rm Python、Ruby}Python、Ruby、MATLAB{\rm MATLAB}MATLAB等语言的接口;
- 实现了图像处理和计算机视觉方面的很多通用算法,且对非商业应用和商业应用都是免费的;
- 可以直接访问硬件摄像头,且提供了一个简单的GUI{\rm GUI}GUI系统–highgui{\rm highgui}highgui;
# 1.安装OpenCV
sudo apt-get install ros-melodic-vision-opencv libopencv-dev python-opencv# 2.测试例程
roslaunch robot_vision usb_cam.launch
rosrun robot_vision cv_bridge_test.py
rqt_image_view
1.4.2 代码详解
#!/usr/bin/env python
# -*- coding: utf-8 -*-import rospy
import cv2
from cv_bridge import CvBridge, CvBridgeError
from sensor_msgs.msg import Imageclass image_converter:def __init__(self): # 创建cv_bridge,声明图像的发布者和订阅者self.image_pub = rospy.Publisher("cv_bridge_image", Image, queue_size=1)self.bridge = CvBridge()self.image_sub = rospy.Subscriber("/usb_cam/image_raw", Image, self.callback)def callback(self,data):# 使用cv_bridge将ROS的图像数据转换成OpenCV的图像格式try:cv_image = self.bridge.imgmsg_to_cv2(data, "bgr8")except CvBridgeError as e:print e# 在opencv的显示窗口中绘制一个圆,作为标记(rows,cols,channels) = cv_image.shapeif cols > 60 and rows > 60 :cv2.circle(cv_image, (60, 60), 30, (0,0,255), -1)# 显示Opencv格式的图像cv2.imshow("Image window", cv_image)cv2.waitKey(3)# 再将opencv格式额数据转换成ros image格式的数据发布try:self.image_pub.publish(self.bridge.cv2_to_imgmsg(cv_image, "bgr8"))except CvBridgeError as e:print eif __name__ == '__main__':try:# 初始化ros节点rospy.init_node("cv_bridge_test")rospy.loginfo("Starting cv_bridge_test node")image_converter()rospy.spin()except KeyboardInterrupt:print "Shutting down cv_bridge_test node."cv2.destroyAllWindows()
- imgmsg_to_cv2(){\rm imgmsg\_to\_cv2()}imgmsg_to_cv2():将ROS{\rm ROS}ROS图像消息转换成OpenCV{\rm OpenCV}OpenCV图像数据;
- cv2_to_imgmsg(){\rm cv2\_to\_imgmsg()}cv2_to_imgmsg():将OpenCV{\rm OpenCV}OpenCV格式的图像数据转换成ROS{\rm ROS}ROS图像消息;
- 输入参数:
- 图像消息流;
- 转换的图像数据格式;
1.4.3 二维码识别
# 1.安装二维码识别功能包
sudo apt-get install ros-melodic-ar-track-alvar# 2.启动节点管理器
roscore# 3.创建二维码常用命令
rosrun ar_track_alvar createMarker
rosrun ar_track_alvar createMarker0# 4.创建大小为5的二维码
# 在config文件夹下创建了三张二维码
roscd robot_vision/config
rosrun ar_track_alvar createMarker -s 5 0
rosrun ar_track_alvar createMarker -s 5 1
rosrun ar_track_alvar createMarker -s 5 2
rosrun ar_track_alvar createMarker -s 5 3# 5.启动.launch文件
roslaunch robot_vision usb_cam_with_calibration.launch
roslaunch robot_vision ar_track_camera.launch# 6.查看识别到的二维码位姿
rostopic echo /ar_pose_marker
# usb_cam_with_calibration.launch文件
<launch><node name="usb_cam" pkg="usb_cam" type="usb_cam_node" output="screen" ><param name="video_device" value="/dev/video0" /><param name="image_width" value="1280" /><param name="image_height" value="720" /><param name="pixel_format" value="yuyv" /><param name="camera_frame_id" value="usb_cam" /><param name="io_method" value="mmap"/><param name="camera_info_url" type="string" value="file://$(find robot_vision)/camera_calibration.yaml" /></node></launch>
# ar_track_camera.launch文件
<launch><node pkg="tf" type="static_transform_publisher" name="world_to_cam" args="0 0 0.5 0 1.57 0 world usb_cam 10" /><arg name="marker_size" default="5" /><arg name="max_new_marker_error" default="0.08" /><arg name="max_track_error" default="0.2" /><arg name="cam_image_topic" default="/usb_cam/image_raw" /><arg name="cam_info_topic" default="/usb_cam/camera_info" /><arg name="output_frame" default="/usb_cam" /><node name="ar_track_alvar" pkg="ar_track_alvar" type="individualMarkersNoKinect" respawn="false" output="screen"><param name="marker_size" type="double" value="$(arg marker_size)" /><param name="max_new_marker_error" type="double" value="$(arg max_new_marker_error)" /><param name="max_track_error" type="double" value="$(arg max_track_error)" /><param name="output_frame" type="string" value="$(arg output_frame)" /><remap from="camera_image" to="$(arg cam_image_topic)" /><remap from="camera_info" to="$(arg cam_info_topic)" /></node><!-- rviz view /--><node pkg="rviz" type="rviz" name="rviz" args="-d $(find robot_vision)/config/ar_track_camera.rviz"/></launch>
2.机器语音
-
常用功能包
-
pocketsphinx{\rm pocketsphinx}pocketsphinx
集成CMUSphinx{\rm CMU\ Sphinx}CMU Sphinx和Festival{\rm Festival}Festival开源项目中的代码,实现语音识别的功能;
-
audio−common{\rm audio-common}audio−common
提供了文本转语音(Text−to−speech,TTS)({\rm Text-to-speech,TTS})(Text−to−speech,TTS)的功能实现"机器人说话"的想法;
-
AIML{\rm AIML}AIML
人工智能标记语言(ArtificialIntelligenceMarkupLanguage)({\rm Artificial\ Intelligence\ Markup\ Language})(Artificial Intelligence Markup Language),是一种创建自然语言软件代理的XML{\rm XML}XML语言;
-
-
科大讯飞SDK{\rm SDK}SDK
# 1.将科大讯飞SDK库文件拷贝到系统目录下 sudo cp libmsc.so /usr/lib/libmsc.so# 2.实现语音听写功能,具体代码见下一代码块 # 3.添加编译选项,各代码文件见下代码块 gedit CMakeLists.txtadd_executable(iat_publishsrc/iat_publish.cppsrc/speech_recognizer.csrc/linuxrec.c ) target_link_libraries(iat_publish$ {catkin_LIBRARIES}libmsc.so -ldl -lpthread -lm -lrt -lasound )# 4.测试语音听写 # 注:如果录入声音后没有反应,先检查是否开启了虚拟机声卡 roscore rosrun robot_voice iat_publish rostopic pub /voiceWakeup std_msgs/String "data:'Hello'"
-
语音听写功能
// iat_publish.cpp文件内容,使用时删掉此行注释;/* * 语音听写(iFly Auto Transform)技术能够实时地将语音转换成对应的文字。 */#include <stdlib.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include "robot_voice/qisr.h" #include "robot_voice/msp_cmn.h" #include "robot_voice/msp_errors.h" #include "robot_voice/speech_recognizer.h" #include <iconv.h>#include "ros/ros.h" #include "std_msgs/String.h"#define FRAME_LEN 640 #define BUFFER_SIZE 4096int wakeupFlag = 0 ; int resultFlag = 0 ;static void show_result(char *string, char is_over) {resultFlag=1; printf("\rResult: [ %s ]", string);if(is_over)putchar('\n'); }static char *g_result = NULL; static unsigned int g_buffersize = BUFFER_SIZE;void on_result(const char *result, char is_last) {if (result) {size_t left = g_buffersize - 1 - strlen(g_result);size_t size = strlen(result);if (left < size) {g_result = (char*)realloc(g_result, g_buffersize + BUFFER_SIZE);if (g_result)g_buffersize += BUFFER_SIZE;else {printf("mem alloc failed\n");return;}}strncat(g_result, result, size);show_result(g_result, is_last);} }void on_speech_begin() {if (g_result){free(g_result);}g_result = (char*)malloc(BUFFER_SIZE);g_buffersize = BUFFER_SIZE;memset(g_result, 0, g_buffersize);printf("Start Listening...\n"); } void on_speech_end(int reason) {if (reason == END_REASON_VAD_DETECT)printf("\nSpeaking done \n");elseprintf("\nRecognizer error %d\n", reason); }/* demo recognize the audio from microphone */ static void demo_mic(const char* session_begin_params) {int errcode;int i = 0;struct speech_rec iat;struct speech_rec_notifier recnotifier = {on_result,on_speech_begin,on_speech_end};errcode = sr_init(&iat, session_begin_params, SR_MIC, &recnotifier);if (errcode) {printf("speech recognizer init failed\n");return;}errcode = sr_start_listening(&iat);if (errcode) {printf("start listen failed %d\n", errcode);}/* demo 10 seconds recording */while(i++ < 10)sleep(1);errcode = sr_stop_listening(&iat);if (errcode) {printf("stop listening failed %d\n", errcode);}sr_uninit(&iat); }/* main thread: start/stop record ; query the result of recgonization.* record thread: record callback(data write)* helper thread: ui(keystroke detection)*/void WakeUp(const std_msgs::String::ConstPtr& msg) {printf("waking up\r\n");usleep(700*1000);wakeupFlag=1; }int main(int argc, char* argv[]) {// 初始化ROSros::init(argc, argv, "voiceRecognition");ros::NodeHandle n;ros::Rate loop_rate(10);// 声明Publisher和Subscriber// 订阅唤醒语音识别的信号ros::Subscriber wakeUpSub = n.subscribe("voiceWakeup", 1000, WakeUp); // 订阅唤醒语音识别的信号 ros::Publisher voiceWordsPub = n.advertise<std_msgs::String>("voiceWords", 1000); ROS_INFO("Sleeping...");int count=0;while(ros::ok()){// 语音识别唤醒if (wakeupFlag){ROS_INFO("Wakeup...");int ret = MSP_SUCCESS;const char* login_params = "appid = 594a7b46, work_dir = .";const char* session_begin_params ="sub = iat, domain = iat, language = zh_cn, ""accent = mandarin, sample_rate = 16000, ""result_type = plain, result_encoding = utf8";ret = MSPLogin(NULL, NULL, login_params);if(MSP_SUCCESS != ret){MSPLogout();printf("MSPLogin failed , Error code %d.\n",ret);}printf("Demo recognizing the speech from microphone\n");printf("Speak in 10 seconds\n");demo_mic(session_begin_params);printf("10 sec passed\n");wakeupFlag=0;MSPLogout();}// 语音识别完成if(resultFlag){resultFlag=0;std_msgs::String msg;msg.data = g_result;voiceWordsPub.publish(msg);}ros::spinOnce();loop_rate.sleep();count++;}exit:MSPLogout(); // Logout...return 0; }
- subscriber{\rm subscriber}subscriber:用来接收语音唤醒信号,接收到唤醒信号后,将wakeupFlag{\rm wakeupFlag}wakeupFlag变量置位;
- 主循环中调用SDK{\rm SDK}SDK的语音听写功能,识别成功后置位resultFlag{\rm resultFlag}resultFlag变量,通过Publisher{\rm Publisher}Publisher将识别出来的字符串发布;
# CMakeLists.txt文件内容cmake_minimum_required(VERSION 2.8.3) project(robot_voice)## Add support for C++11, supported in ROS Kinetic and newer add_definitions(-std=c++11)## Find catkin macros and libraries ## if COMPONENTS list like find_package(catkin REQUIRED COMPONENTS xyz) ## is used, also find other catkin packages find_package(catkin REQUIRED COMPONENTSroscpprospystd_msgs )## System dependencies are found with CMake's conventions # find_package(Boost REQUIRED COMPONENTS system)## Uncomment this if the package has a setup.py. This macro ensures ## modules and global scripts declared therein get installed ## See http://ros.org/doc/api/catkin/html/user_guide/setup_dot_py.html # catkin_python_setup()################################################ ## Declare ROS messages, services and actions ## ################################################## To declare and build messages, services or actions from within this ## package, follow these steps: ## * Let MSG_DEP_SET be the set of packages whose message types you use in ## your messages/services/actions (e.g. std_msgs, actionlib_msgs, ...). ## * In the file package.xml: ## * add a build_depend tag for "message_generation" ## * add a build_depend and a run_depend tag for each package in MSG_DEP_SET ## * If MSG_DEP_SET isn't empty the following dependency has been pulled in ## but can be declared for certainty nonetheless: ## * add a run_depend tag for "message_runtime" ## * In this file (CMakeLists.txt): ## * add "message_generation" and every package in MSG_DEP_SET to ## find_package(catkin REQUIRED COMPONENTS ...) ## * add "message_runtime" and every package in MSG_DEP_SET to ## catkin_package(CATKIN_DEPENDS ...) ## * uncomment the add_*_files sections below as needed ## and list every .msg/.srv/.action file to be processed ## * uncomment the generate_messages entry below ## * add every package in MSG_DEP_SET to generate_messages(DEPENDENCIES ...)## Generate messages in the 'msg' folder # add_message_files( # FILES # Message1.msg # Message2.msg # )## Generate services in the 'srv' folder # add_service_files( # FILES # Service1.srv # Service2.srv # )## Generate actions in the 'action' folder # add_action_files( # FILES # Action1.action # Action2.action # )## Generate added messages and services with any dependencies listed here # generate_messages( # DEPENDENCIES # std_msgs # )################################################ ## Declare ROS dynamic reconfigure parameters ## ################################################## To declare and build dynamic reconfigure parameters within this ## package, follow these steps: ## * In the file package.xml: ## * add a build_depend and a run_depend tag for "dynamic_reconfigure" ## * In this file (CMakeLists.txt): ## * add "dynamic_reconfigure" to ## find_package(catkin REQUIRED COMPONENTS ...) ## * uncomment the "generate_dynamic_reconfigure_options" section below ## and list every .cfg file to be processed## Generate dynamic reconfigure parameters in the 'cfg' folder # generate_dynamic_reconfigure_options( # cfg/DynReconf1.cfg # cfg/DynReconf2.cfg # )################################### ## catkin specific configuration ## ################################### ## The catkin_package macro generates cmake config files for your package ## Declare things to be passed to dependent projects ## INCLUDE_DIRS: uncomment this if you package contains header files ## LIBRARIES: libraries you create in this project that dependent projects also need ## CATKIN_DEPENDS: catkin_packages dependent projects also need ## DEPENDS: system dependencies of this project that dependent projects also need catkin_package( # INCLUDE_DIRS include # LIBRARIES robot_voice # CATKIN_DEPENDS roscpp rospy std_msgs # DEPENDS system_lib )########### ## Build ## ############# Specify additional locations of header files ## Your package locations should be listed before other locations # include_directories(include) include_directories(${catkin_INCLUDE_DIRS}include )## Declare a C++ library # add_library(${PROJECT_NAME} # src/${PROJECT_NAME}/robot_voice.cpp # )## Add cmake target dependencies of the library ## as an example, code may need to be generated before libraries ## either from message generation or dynamic reconfigure # add_dependencies(${PROJECT_NAME} ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})## Declare a C++ executable ## With catkin_make all packages are built within a single CMake context ## The recommended prefix ensures that target names across packages don't collide # add_executable(${PROJECT_NAME}_node src/robot_voice_node.cpp)## Rename C++ executable without prefix ## The above recommended prefix causes long target names, the following renames the ## target back to the shorter version for ease of user use ## e.g. "rosrun someones_pkg node" instead of "rosrun someones_pkg someones_pkg_node" # set_target_properties(${PROJECT_NAME}_node PROPERTIES OUTPUT_NAME node PREFIX "")## Add cmake target dependencies of the executable ## same as for the library above # add_dependencies(${PROJECT_NAME}_node ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})## Specify libraries to link a library or executable target against # target_link_libraries(${PROJECT_NAME}_node # ${catkin_LIBRARIES} # )add_executable(tts_subscribe src/tts_subscribe.cpp) target_link_libraries(tts_subscribe${catkin_LIBRARIES} libmsc.so -ldl -pthread)add_executable(iat_publish src/iat_publish.cpp src/speech_recognizer.c src/linuxrec.c) target_link_libraries(iat_publish${catkin_LIBRARIES} libmsc.so -ldl -lpthread -lm -lrt -lasound)add_executable(voice_assistant src/voice_assistant.cpp) target_link_libraries(voice_assistant${catkin_LIBRARIES} libmsc.so -ldl -pthread)############# ## Install ## ############## all install targets should use catkin DESTINATION variables # See http://ros.org/doc/api/catkin/html/adv_user_guide/variables.html## Mark executable scripts (Python etc.) for installation ## in contrast to setup.py, you can choose the destination # install(PROGRAMS # scripts/my_python_script # DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION} # )## Mark executables and/or libraries for installation # install(TARGETS ${PROJECT_NAME} ${PROJECT_NAME}_node # ARCHIVE DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION} # LIBRARY DESTINATION ${CATKIN_PACKAGE_LIB_DESTINATION} # RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION} # )## Mark cpp header files for installation # install(DIRECTORY include/${PROJECT_NAME}/ # DESTINATION ${CATKIN_PACKAGE_INCLUDE_DESTINATION} # FILES_MATCHING PATTERN "*.h" # PATTERN ".svn" EXCLUDE # )## Mark other files for installation (e.g. launch and bag files, etc.) # install(FILES # # myfile1 # # myfile2 # DESTINATION ${CATKIN_PACKAGE_SHARE_DESTINATION} # )############# ## Testing ## ############### Add gtest based cpp test target and link libraries # catkin_add_gtest(${PROJECT_NAME}-test test/test_robot_voice.cpp) # if(TARGET ${PROJECT_NAME}-test) # target_link_libraries(${PROJECT_NAME}-test ${PROJECT_NAME}) # endif()## Add folders to be run by python nosetests # catkin_add_nosetests(test)
语音听写效果:
-
语音合成
// tts_subscribe.cpp文件内容/* * 语音合成(Text To Speech,TTS)技术能够自动将任意文字实时转换为连续的 * 自然语音,是一种能够在任何时间、任何地点,向任何人提供语音信息服务的 * 高效便捷手段,非常符合信息时代海量数据、动态更新和个性化查询的需求。 */#include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <errno.h>#include "robot_voice/qtts.h" #include "robot_voice/msp_cmn.h" #include "robot_voice/msp_errors.h"#include "ros/ros.h" #include "std_msgs/String.h"#include <sstream> #include <sys/types.h> #include <sys/stat.h>/* wav音频头部格式 */ typedef struct _wave_pcm_hdr {char riff[4]; // = "RIFF"int size_8; // = FileSize - 8char wave[4]; // = "WAVE"char fmt[4]; // = "fmt "int fmt_size; // = 下一个结构体的大小 : 16short int format_tag; // = PCM : 1short int channels; // = 通道数 : 1int samples_per_sec; // = 采样率 : 8000 | 6000 | 11025 | 16000int avg_bytes_per_sec; // = 每秒字节数 : samples_per_sec * bits_per_sample / 8short int block_align; // = 每采样点字节数 : wBitsPerSample / 8short int bits_per_sample; // = 量化比特数: 8 | 16char data[4]; // = "data";int data_size; // = 纯数据长度 : FileSize - 44 } wave_pcm_hdr;/* 默认wav音频头部数据 */ wave_pcm_hdr default_wav_hdr = {{ 'R', 'I', 'F', 'F' },0,{'W', 'A', 'V', 'E'},{'f', 'm', 't', ' '},16,1,1,16000,32000,2,16,{'d', 'a', 't', 'a'},0 }; /* 文本合成 */ int text_to_speech(const char* src_text, const char* des_path, const char* params) {int ret = -1;FILE* fp = NULL;const char* sessionID = NULL;unsigned int audio_len = 0;wave_pcm_hdr wav_hdr = default_wav_hdr;int synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;if (NULL == src_text || NULL == des_path){printf("params is error!\n");return ret;}fp = fopen(des_path, "wb");if (NULL == fp){printf("open %s error.\n", des_path);return ret;}/* 开始合成 */sessionID = QTTSSessionBegin(params, &ret);if (MSP_SUCCESS != ret){printf("QTTSSessionBegin failed, error code: %d.\n", ret);fclose(fp);return ret;}ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);if (MSP_SUCCESS != ret){printf("QTTSTextPut failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "TextPutError");fclose(fp);return ret;}printf("正在合成 ...\n");fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头,使用采样率为16000while (1) {/* 获取合成音频 */const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret);if (MSP_SUCCESS != ret)break;if (NULL != data){fwrite(data, audio_len, 1, fp);wav_hdr.data_size += audio_len; //计算data_size大小}if (MSP_TTS_FLAG_DATA_END == synth_status)break;printf(">");usleep(150*1000); //防止频繁占用CPU}//合成状态synth_status取值请参阅《讯飞语音云API文档》printf("\n");if (MSP_SUCCESS != ret){printf("QTTSAudioGet failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "AudioGetError");fclose(fp);return ret;}/* 修正wav文件头数据的大小 */wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);/* 将修正过的数据写回文件头部,音频文件为wav格式 */fseek(fp, 4, 0);fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //写入size_8的值fseek(fp, 40, 0); //将文件指针偏移到存储data_size值的位置fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值fclose(fp);fp = NULL;/* 合成完毕 */ret = QTTSSessionEnd(sessionID, "Normal");if (MSP_SUCCESS != ret){printf("QTTSSessionEnd failed, error code: %d.\n",ret);}return ret; }void voiceWordsCallback(const std_msgs::String::ConstPtr& msg) {char cmd[2000];const char* text;int ret = MSP_SUCCESS;const char* session_begin_params = "voice_name = xiaoyan, text_encoding = utf8, sample_rate = 16000, speed = 50, volume = 50, pitch = 50, rdn = 2";const char* filename = "tts_sample.wav"; //合成的语音文件名称std::cout<<"I heard :"<<msg->data.c_str()<<std::endl;text = msg->data.c_str(); /* 文本合成 */printf("开始合成 ...\n");ret = text_to_speech(text, filename, session_begin_params);if (MSP_SUCCESS != ret){printf("text_to_speech failed, error code: %d.\n", ret);}printf("合成完毕\n");unlink("/tmp/cmd"); mkfifo("/tmp/cmd", 0777); popen("mplayer -quiet -slave -input file=/tmp/cmd 'tts_sample.wav'","r");sleep(3); }void toExit() {printf("按任意键退出 ...\n");getchar();MSPLogout(); //退出登录 }int main(int argc, char* argv[]) {int ret = MSP_SUCCESS;const char* login_params = "appid = 594a7b46, work_dir = .";//登录参数,appid与msc库绑定,请勿随意改动/** rdn: 合成音频数字发音方式* volume: 合成音频的音量* pitch: 合成音频的音调* speed: 合成音频对应的语速* voice_name: 合成发音人* sample_rate: 合成音频采样率* text_encoding: 合成文本编码格式** 详细参数说明请参阅《讯飞语音云MSC--API文档》*//* 用户登录 */ret = MSPLogin(NULL, NULL, login_params);//第一个参数是用户名,第二个参数是密码,第三个参数是登录参数,用户名和密码可在http://open.voicecloud.cn注册获取if (MSP_SUCCESS != ret){printf("MSPLogin failed, error code: %d.\n", ret);/*goto exit ;*///登录失败,退出登录toExit();}printf("\n###########################################################################\n");printf("## 语音合成(Text To Speech,TTS)技术能够自动将任意文字实时转换为连续的 ##\n");printf("## 自然语音,是一种能够在任何时间、任何地点,向任何人提供语音信息服务的 ##\n");printf("## 高效便捷手段,非常符合信息时代海量数据、动态更新和个性化查询的需求。 ##\n");printf("###########################################################################\n\n");ros::init(argc,argv,"TextToSpeech");ros::NodeHandle n;ros::Subscriber sub =n.subscribe("voiceWords", 1000,voiceWordsCallback);ros::spin();exit:printf("按任意键退出 ...\n");getchar();MSPLogout(); //退出登录return 0; }
- main{\rm main}main函数中声明了一个订阅voiceWords{\rm voiceWords}voiceWords话题的subscriber{\rm subscriber}subscriber,接收输入的语音字符串;
- 回调函数voiceWordsCallback{\rm voiceWordsCallback}voiceWordsCallback中使用SDK{\rm SDK}SDK接口将字符串转换成中文语音;
# 1.编写代码 touch tts_subscribe.cpp# 2.修改CMakeLists.txt文件,完整内容见上一部分 gedit CMakeLists.txt===================================================== add_executable(tts_subscribe src/tts_subscribe.cpp) target_link_libraries(tts_subscribe$ {catkin_LIBRARIES}libmsc.so -ldl -pthread ) =====================================================# 3.工作空间下编译 cd ~/willard_ws/ catkin_make source devel/setup.bash# 4.运行例程 roscore rosrun robot_voice tts_subscribe rostopic pub /voiceWords std_msgs/String "data:'欢迎来到伏羲科技'"# 5.抛出"sh: 1: mplayer: not found"错误 sudo apt-get install mplayer# 6.运行"rosrun robot_voice tts_subscribe"抛出如下错误===================================================== # do_connect: could not connect to socket # connect: No such file or directory # Failed to open LIRC support. You will not be able to use your remote control. =====================================================# 解决方案: # 在$HOME/.mplayer/config文件中添加如下内容:lirc=no
语音合成效果:
-
智能语音助手
#include <iostream> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <errno.h> #include <time.h> #include "robot_voice/qtts.h" #include "robot_voice/msp_cmn.h" #include "robot_voice/msp_errors.h"#include "ros/ros.h" #include "std_msgs/String.h"#include <sstream> #include <sys/types.h> #include <sys/stat.h>/* wav音频头部格式 */ typedef struct _wave_pcm_hdr {char riff[4]; // = "RIFF"int size_8; // = FileSize - 8char wave[4]; // = "WAVE"char fmt[4]; // = "fmt "int fmt_size; // = 下一个结构体的大小 : 16short int format_tag; // = PCM : 1short int channels; // = 通道数 : 1int samples_per_sec; // = 采样率 : 8000 | 6000 | 11025 | 16000int avg_bytes_per_sec; // = 每秒字节数 : samples_per_sec * bits_per_sample / 8short int block_align; // = 每采样点字节数 : wBitsPerSample / 8short int bits_per_sample; // = 量化比特数: 8 | 16char data[4]; // = "data";int data_size; // = 纯数据长度 : FileSize - 44 } wave_pcm_hdr;/* 默认wav音频头部数据 */ wave_pcm_hdr default_wav_hdr = {{ 'R', 'I', 'F', 'F' },0,{'W', 'A', 'V', 'E'},{'f', 'm', 't', ' '},16,1,1,16000,32000,2,16,{'d', 'a', 't', 'a'},0 }; /* 文本合成 */ int text_to_speech(const char* src_text, const char* des_path, const char* params) {int ret = -1;FILE* fp = NULL;const char* sessionID = NULL;unsigned int audio_len = 0;wave_pcm_hdr wav_hdr = default_wav_hdr;int synth_status = MSP_TTS_FLAG_STILL_HAVE_DATA;if (NULL == src_text || NULL == des_path){printf("params is error!\n");return ret;}fp = fopen(des_path, "wb");if (NULL == fp){printf("open %s error.\n", des_path);return ret;}/* 开始合成 */sessionID = QTTSSessionBegin(params, &ret);if (MSP_SUCCESS != ret){printf("QTTSSessionBegin failed, error code: %d.\n", ret);fclose(fp);return ret;}ret = QTTSTextPut(sessionID, src_text, (unsigned int)strlen(src_text), NULL);if (MSP_SUCCESS != ret){printf("QTTSTextPut failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "TextPutError");fclose(fp);return ret;}printf("正在合成 ...\n");fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头,使用采样率为16000while (1) {/* 获取合成音频 */const void* data = QTTSAudioGet(sessionID, &audio_len, &synth_status, &ret);if (MSP_SUCCESS != ret)break;if (NULL != data){fwrite(data, audio_len, 1, fp);wav_hdr.data_size += audio_len; //计算data_size大小}if (MSP_TTS_FLAG_DATA_END == synth_status)break;printf(">");usleep(150*1000); //防止频繁占用CPU}//合成状态synth_status取值请参阅《讯飞语音云API文档》printf("\n");if (MSP_SUCCESS != ret){printf("QTTSAudioGet failed, error code: %d.\n",ret);QTTSSessionEnd(sessionID, "AudioGetError");fclose(fp);return ret;}/* 修正wav文件头数据的大小 */wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);/* 将修正过的数据写回文件头部,音频文件为wav格式 */fseek(fp, 4, 0);fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //写入size_8的值fseek(fp, 40, 0); //将文件指针偏移到存储data_size值的位置fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值fclose(fp);fp = NULL;/* 合成完毕 */ret = QTTSSessionEnd(sessionID, "Normal");if (MSP_SUCCESS != ret){printf("QTTSSessionEnd failed, error code: %d.\n",ret);}return ret; }std::string to_string(int val) {char buf[20];sprintf(buf, "%d", val);return std::string(buf); }void voiceWordsCallback(const std_msgs::String::ConstPtr& msg) {char cmd[2000];const char* text;int ret = MSP_SUCCESS;const char* session_begin_params = "voice_name = xiaoyan, text_encoding = utf8, sample_rate = 16000, speed = 50, volume = 50, pitch = 50, rdn = 2";const char* filename = "tts_sample.wav"; //合成的语音文件名称std::cout<<"I heard :"<<msg->data.c_str()<<std::endl;std::string dataString = msg->data;if(dataString.compare("小胖小胖。") == 0){char nameString[40] = "我在。主人您请说。";text = nameString;std::cout<<text<<std::endl;}else if(dataString.compare("你可以做什么?") == 0){char helpString1[40] = "我可以给您介绍伏羲科技公司";text = helpString1;std::cout<<text<<std::endl;}else if(dataString.compare("开始介绍。") == 0){char introduceString[40] = "伏羲科技专注于AGV的研发。";text = introduceString;std::cout<<text<<std::endl;}else if(dataString.compare("你还可以做什么?") == 0){char helpString[40] = "你还可以问我现在时间";text = helpString;std::cout<<text<<std::endl;}else if(dataString.compare("现在时间。") == 0){//获取当前时间struct tm *ptm; long ts; ts = time(NULL); ptm = localtime(&ts); std::string string = "现在时间" + to_string(ptm-> tm_hour) + "点" + to_string(ptm-> tm_min) + "分";char timeString[40];string.copy(timeString, sizeof(string), 0);text = timeString;std::cout<<text<<std::endl;}else{text = msg->data.c_str();}/* 文本合成 */printf("开始合成 ...\n");ret = text_to_speech(text, filename, session_begin_params);if (MSP_SUCCESS != ret){printf("text_to_speech failed, error code: %d.\n", ret);}printf("合成完毕\n");unlink("/tmp/cmd"); mkfifo("/tmp/cmd", 0777); popen("mplayer -quiet -slave -input file=/tmp/cmd 'tts_sample.wav'","r");sleep(3); }void toExit() {printf("按任意键退出 ...\n");getchar();MSPLogout(); //退出登录 }int main(int argc, char* argv[]) {int ret = MSP_SUCCESS;const char* login_params = "appid = 594a7b46, work_dir = .";//登录参数,appid与msc库绑定,请勿随意改动/** rdn: 合成音频数字发音方式* volume: 合成音频的音量* pitch: 合成音频的音调* speed: 合成音频对应的语速* voice_name: 合成发音人* sample_rate: 合成音频采样率* text_encoding: 合成文本编码格式** 详细参数说明请参阅《讯飞语音云MSC--API文档》*//* 用户登录 */ret = MSPLogin(NULL, NULL, login_params);//第一个参数是用户名,第二个参数是密码,第三个参数是登录参数,用户名和密码可在http://open.voicecloud.cn注册获取if (MSP_SUCCESS != ret){printf("MSPLogin failed, error code: %d.\n", ret);/*goto exit ;*///登录失败,退出登录toExit();}printf("\n###########################################################################\n");printf("## 语音合成(Text To Speech,TTS)技术能够自动将任意文字实时转换为连续的 ##\n");printf("## 自然语音,是一种能够在任何时间、任何地点,向任何人提供语音信息服务的 ##\n");printf("## 高效便捷手段,非常符合信息时代海量数据、动态更新和个性化查询的需求。 ##\n");printf("###########################################################################\n\n");ros::init(argc,argv,"TextToSpeech");ros::NodeHandle n;ros::Subscriber sub =n.subscribe("voiceWords", 1000,voiceWordsCallback);ros::spin();exit:printf("按任意键退出 ...\n");getchar();MSPLogout(); //退出登录return 0; }
# 1.编写代码 touch voice_assistant.cpp# 2.修改CMakeLists.txt文件,完整内容见上一部分 gedit CMakeLists.txt===================================================== add_executable(voice_assistant src/voice_assistant.cpp) target_link_libraries(voice_assistant$ {catkin_LIBRARIES}libmsc.so -ldl -pthread ) =====================================================# 3.工作空间下编译 cd ~/willard_ws/ catkin_make source devel/setup.bash# 4.运行例程 roscore rosrun robot_voice iat_publish rosrun robot_voice voice_assistant rostopic pub /voiceWakeup std_msgs/String "data: 'hello'"
语音助手效果如下图所示: