当前位置：首页 > news >正文

FFmpeg进阶: 采用音频滤镜对音频进行转码

news 2025/6/27 3:29:46

文章目录

- 采样位数
- 采样率
- 声道布局
- 码率
- 使用FFmpeg音频滤镜进行转码
- 参考链接

很多时候为了让视频文件适应不同的播放领域，我们需要对音频文件进行转码操作，转码操作其实主要就是修改音频文件的各种参数包括:采样位数、采样率、音频布局、码率等等。下面分别介绍一下各个参数的意义和作用。

采样位数

采样位数也称为位深度、分辨率，它是指声音的连续强度被数字表示后可以分为多少级。N-bit的意思声音的强度被均分为2^N级。16位的就是65535级。这是一个很大的数了，人可能也分辨不出1/65536的音强差别。也可以说是声卡的分辨率，它的数值越大，分辨率也就越高，所发出声音的能力越强。这里的采样倍数主要针对的是信号的强度特性，采样率针对的是信号的时间(频率)特性这是两个不一样的概念。

ffmpeg常用的采样位数对应的格式如下所示:

enum AVSampleFormat {AV_SAMPLE_FMT_NONE = -1,AV_SAMPLE_FMT_U8,          ///< unsigned 8 bitsAV_SAMPLE_FMT_S16,         ///< signed 16 bitsAV_SAMPLE_FMT_S32,         ///< signed 32 bitsAV_SAMPLE_FMT_FLT,         ///< floatAV_SAMPLE_FMT_DBL,         ///< doubleAV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planarAV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planarAV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planarAV_SAMPLE_FMT_FLTP,        ///< float, planarAV_SAMPLE_FMT_DBLP,        ///< double, planarAV_SAMPLE_FMT_S64,         ///< signed 64 bitsAV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planarAV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

采样率

音频采样，是把声音从模拟信号转换为数字信号。采样率，就是每秒对声音进行采集的次数，同样也是所得的数字信号的每秒样本数。在对声音进行采样时，常用的采样率有：
8,000 Hz - 电话所用采样率, 对于人的说话已经足够
11,025 Hz - AM调幅广播所用采样率
22,050 Hz~24,000 Hz - FM调频广播所用采样率
32,000 Hz - miniDV 数码视频 camcorder、DAT (LP mode)所用采样率
44,100 Hz - 音频 CD, 也常用于 MPEG-1 音频（VCD, SVCD, MP3）所用采样率
47,250 Hz - 商用PCM录音机所用采样率
48,000 Hz - miniDV、数字电视、DVD、DAT、电影和专业音频所用的数字声音所用采样率
50,000 Hz - 商用数字录音机所用采样率
96,000 或者192,000 Hz - DVD-Audio、一些 LPCM DVD 音轨、BD-ROM（蓝光盘）音轨、和 HD-DVD （高清晰度 DVD）音轨所用所用采样率
2.8224 MHz - Direct Stream Digital 的 1 位 sigma-delta modulation 过程所用采样率。

采样越高，声音的还原就越真实越自然，人对频率的识别范围是20HZ - 20000HZ, 如果每秒钟能对声音做 20000 个采样, 回放时就足可以满足人耳的需求.所以 22050 的采样频率是常用的, 44100已是CD音质, 超过48000的采样对人耳已经没有意义。这和电影的每秒 24 帧图片的道理差不多。

声道布局

当人听到声音时，能对声源进行定位，那么通过在不同的位置设置声源，就可以造就出更好的听觉感受。常见的声道有:

单声道, mono
双声道, stereo, 最常见的类型，包含左声道以及右声道
2.1声道，在双声道基础上加入一个低音声道
5.1声道，包含一个正面声道、左前方声道、右前方声道、左环绕声道、右环绕声道、一个低音声道，最早应用于早期的电影院
7.1声道，在5.1声道的基础上，把左右的环绕声道拆分为左右环绕声道以及左右后置声道，主要应用于BD以及现代的电影院

码率

码率也就是每秒的传输速率(也叫比特率)，压缩的音频文件常用倍速来表示，比如达到CD音质的MP3是128kbps/44100HZ。注意这里的单位是bit而不是Byte,一个Byte等于8个bit(位),bit是最小的单位，一般用于网络速度的描述和各种通信速度，Byte则用于计算硬盘，内存的大小。

使用FFmpeg音频滤镜进行转码

不同领域对音频的播放要求是不一样的，所以需要针对不同的领域对音频参数进行调整，这里介绍一下如何通过音频滤镜调整音频数据的相关参数，对应的实现如下:

#include "../audio_filter.h"extern "C" 
{
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/avutil.h>
#include <libavfilter/avfilter.h>
#include <libswresample/swresample.h>
}#include <string>/**@brief 转换音频数据的格式
* @param[in]  output_filename 输出文件名称
* @param[in]  input_filename 输入文件名称
* @param[in]  sample_fmt 采样格式
* @param[in]  sample_rate 采样率
* @param[in]  channel_layout 通道布局
* @param[in]  bitrate 码率
* @return  函数执行结果
* - 0     成功
* - 其它  失败
*/
int transcode_audio(const char *output_filename, const char *input_filename, AVSampleFormat sample_fmt,int sample_rate, uint64_t channel_layout, uint64_t bitrate) 
{//输入输出格式AVFormatContext *inFmtCtx = nullptr;AVFormatContext *outFmtCtx = nullptr;//解码器和编码器AVCodecContext *aDecCtx = nullptr;AVCodecContext *aEncCtx = nullptr;//输出流AVStream *aOutStream = nullptr;int ret;// open input fileret = avformat_open_input(&inFmtCtx, input_filename, nullptr, nullptr);ret = avformat_find_stream_info(inFmtCtx, nullptr);// open output fileavformat_alloc_output_context2(&outFmtCtx, nullptr, nullptr, output_filename);for (int i = 0; i < inFmtCtx->nb_streams; ++i){AVStream *inStream = inFmtCtx->streams[i];if (inStream->codecpar->codec_type == AVMEDIA_TYPE_AUDIO){//输入流的解码器AVCodec *decoder = avcodec_find_decoder(inStream->codecpar->codec_id);aDecCtx = avcodec_alloc_context3(decoder);ret = avcodec_parameters_to_context(aDecCtx, inStream->codecpar);ret = avcodec_open2(aDecCtx, decoder, nullptr);//输出流的编码器AVCodec *encoder = avcodec_find_encoder(outFmtCtx->oformat->audio_codec);aOutStream = avformat_new_stream(outFmtCtx, encoder);aOutStream->id = outFmtCtx->nb_streams - 1;aEncCtx = avcodec_alloc_context3(encoder);aEncCtx->codec_id = encoder->id;aEncCtx->sample_fmt = sample_fmt ? sample_fmt : aDecCtx->sample_fmt;aEncCtx->sample_rate = sample_rate ? sample_rate : aDecCtx->sample_rate;aEncCtx->channel_layout = channel_layout;aEncCtx->channels = av_get_channel_layout_nb_channels(channel_layout);aEncCtx->bit_rate = bitrate ? bitrate : aDecCtx->bit_rate;aEncCtx->time_base = { 1, aEncCtx->sample_rate };aOutStream->time_base = aEncCtx->time_base;if (outFmtCtx->oformat->flags & AVFMT_GLOBALHEADER)aEncCtx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;avcodec_open2(aEncCtx, encoder, nullptr);ret = avcodec_parameters_from_context(aOutStream->codecpar, aEncCtx);av_dict_copy(&aOutStream->metadata, inStream->metadata, 0);break;}}if (!(outFmtCtx->oformat->flags & AVFMT_NOFILE)) {ret = avio_open(&outFmtCtx->pb, output_filename, AVIO_FLAG_WRITE);if (ret < 0) {return -1;}}ret = avformat_write_header(outFmtCtx, nullptr);if (ret < 0) {return -1;}AVFrame *inAudioFrame = av_frame_alloc();AVFrame *outAudioFrame = av_frame_alloc();outAudioFrame->format = aEncCtx->sample_fmt;outAudioFrame->sample_rate = aEncCtx->sample_rate;outAudioFrame->channel_layout = aEncCtx->channel_layout;outAudioFrame->nb_samples = aEncCtx->frame_size;ret = av_frame_get_buffer(outAudioFrame, 0);int64_t audio_pts = 0;//修改音频数据包格式的滤镜AudioFilter filter;char description[512];AudioConfig inConfig(aDecCtx->sample_fmt, aDecCtx->sample_rate, aDecCtx->channel_layout, aDecCtx->time_base);AudioConfig outConfig(aEncCtx->sample_fmt, aEncCtx->sample_rate, aEncCtx->channel_layout, aEncCtx->time_base);char ch_layout[64];av_get_channel_layout_string(ch_layout, sizeof(ch_layout),av_get_channel_layout_nb_channels(aEncCtx->channel_layout), aEncCtx->channel_layout);snprintf(description, sizeof(description),"[in]aresample=sample_rate=%d[res];[res]aformat=sample_fmts=%s:sample_rates=%d:channel_layouts=%s[out]",aEncCtx->sample_rate,av_get_sample_fmt_name(aEncCtx->sample_fmt),aEncCtx->sample_rate,ch_layout);filter.create(description, &inConfig, &outConfig);filter.dumpGraph();while (true) {//解析音频帧并通过滤镜进行处理AVPacket inPacket{ nullptr };av_init_packet(&inPacket);ret = av_read_frame(inFmtCtx, &inPacket);if (ret == AVERROR_EOF) {break;}else if (ret < 0) {return -1;}if (inPacket.stream_index == aOutStream->index) {ret = avcodec_send_packet(aDecCtx, &inPacket);if (ret != 0) {printf("send packet error\n");}ret = avcodec_receive_frame(aDecCtx, inAudioFrame);if (ret == 0) {ret = filter.addInput1(inAudioFrame);av_frame_unref(inAudioFrame);if (ret < 0){printf("add filter input1 error\n");}do {outAudioFrame->nb_samples = aEncCtx->frame_size;ret = filter.getFrame(outAudioFrame);if (ret == 0) {outAudioFrame->pts = audio_pts;audio_pts += outAudioFrame->nb_samples;ret = avcodec_send_frame(aEncCtx, outAudioFrame);if (ret < 0) {printf("unable to send frame: %s\n");}}else {printf("unable to get filter audio frame: %s\n");break;}do {AVPacket outPacket{ nullptr };av_init_packet(&outPacket);ret = avcodec_receive_packet(aEncCtx, &outPacket);if (ret == 0) {av_packet_rescale_ts(&outPacket, aEncCtx->time_base, aOutStream->time_base);outPacket.stream_index = aOutStream->index;ret = av_interleaved_write_frame(outFmtCtx, &outPacket);if (ret < 0) {printf("unable to write packet\n");break;}}else {printf("unable to receive packet\n");break;}} while (true);} while (true);}else {printf("unable to receive frame\n");}}}//清理数据缓存int eof = 0;do {ret = filter.getFrame(outAudioFrame);if (ret == 0) {outAudioFrame->pts = audio_pts;audio_pts += outAudioFrame->nb_samples;}else {printf("filter queue finished\n");}ret = avcodec_send_frame(aEncCtx, ret == 0 ? outAudioFrame : nullptr);do {AVPacket outPacket{ nullptr };ret = avcodec_receive_packet(aEncCtx, &outPacket);if (ret == 0) {av_packet_rescale_ts(&outPacket, aEncCtx->time_base, aOutStream->time_base);outPacket.stream_index = aOutStream->index;ret = av_interleaved_write_frame(outFmtCtx, &outPacket);if (ret < 0) {eof = 1;break;}}else if (ret == AVERROR_EOF) {eof = 1;break;}else {break;}} while (true);} while (!eof);//释放对应的资源filter.destroy();av_write_trailer(outFmtCtx);avformat_close_input(&inFmtCtx);av_frame_free(&inAudioFrame);av_frame_free(&outAudioFrame);avcodec_free_context(&aDecCtx);avcodec_free_context(&aEncCtx);avformat_free_context(inFmtCtx);avformat_free_context(outFmtCtx);return 0;
}

这里我们将音频文件的采样格式修改为AV_SAMPLE_FMT_FLTP，同时我们将采样率降低为22050,码率修改为80kbps。

int main(int argc, char* argv[])
{if (argc != 3){printf("usage:%1 input filepath %2 outputfilepath");return -1;}//输入文件地址、输出文件地址std::string fileInput = std::string(argv[1]);std::string  fileOutput = std::string(argv[2]);transcode_audio(fileOutput.c_str(), fileInput.c_str(),(AVSampleFormat)AV_SAMPLE_FMT_FLTP,22050, AV_CH_LAYOUT_STEREO,80000);
}