当前位置：首页 > news >正文

ChatGPT + Stable Diffusion + 百度AI + MoviePy 实现文字生成视频，小说转视频，自媒体神器！(二)

news 2025/7/5 9:21:52

ChatGPT + Stable Diffusion + 百度AI + MoviePy 实现文字生成视频，小说转视频，自媒体神器！(二)
前言

最近大模型频出，但是对于我们普通人来说，如何使用这些AI工具来辅助我们的工作呢，或者参与进入我们的生活，就着现在比较热门的几个AI，写个一个提高生产力工具，现在在逻辑上已经走通了，后面会针对web页面、后台进行优化。

github链接

B站教程视频 https://www.bilibili.com/video/BV18M4y1H7XN/

第三步、调用百度语音合成包进行语音合成

这里不是智能用百度的API合成，想谷歌的，阿里云的都可以，只是我比较熟悉百度的API ps~: 关键是免费😂

class Main:client_id = client_idclient_secret = client_secretdef create_access_token(self):url = f"https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id={self.client_id}&client_secret={self.client_secret}"payload = ""headers = {'Content-Type': 'application/json','Accept': 'application/json'}response = requests.request("POST", url, headers=headers, data=payload)print("-----------向百度获取 access_token API 发起请求了-----------")access_token = response.json()access_token.update({"time": datetime.now().strftime("%Y-%m-%d")})with open('access_token.json', 'w') as f:json.dump(access_token, f)return access_tokendef get_access_token(self):if os.path.exists('access_token.json'):with open('access_token.json', 'r') as f:data = json.load(f)time = data.get("time")if time and (datetime.now() - datetime.strptime(time, '%Y-%m-%d')).days >= 29:return self.create_access_token()return datareturn self.create_access_token()def text_to_audio(self, text: str, index: int):url = "https://tsn.baidu.com/text2audio"text = text.encode('utf8')FORMATS = {3: "mp3", 4: "pcm", 5: "pcm", 6: "wav"}FORMAT = FORMATS[6]data = {# 合成的文本，文本长度必须小于1024GBK字节。建议每次请求文本不超过120字节，约为60个汉字或者字母数字。"tex": text,# access_token"tok": self.get_access_token().get("access_token"),# 用户唯一标识，用来计算UV值。建议填写能区分用户的机器 MAC 地址或 IMEI 码，长度为60字符以内"cuid": hex(uuid.getnode()),# 客户端类型选择，web端填写固定值1"ctp": "1",# 固定值zh。语言选择,目前只有中英文混合模式，填写固定值zh"lan": "zh",# 语速，取值0-15，默认为5中语速"spd": 5,# 音调，取值0-15，默认为5中语调"pit": 5,# 音量，基础音库取值0-9，精品音库取值0-15，默认为5中音量（取值为0时为音量最小值，并非为无声）"vol": 5,# (基础音库) 度小宇=1，度小美=0，度逍遥（基础）=3，度丫丫=4# (精品音库) 度逍遥（精品）=5003，度小鹿=5118，度博文=106，度小童=110，度小萌=111，度米朵=103，度小娇=5"per": 5003,# 3为mp3格式(默认)； 4为pcm-16k；5为pcm-8k；6为wav（内容同pcm-16k）; 注意aue=4或者6是语音识别要求的格式，但是音频内容不是语音识别要求的自然人发音，所以识别效果会受影响。"aue": FORMAT}data = urllib.parse.urlencode(data)response = requests.post(url, data)if response.status_code == 200:result_str = response.contentsave_file = str(index) + '.' + FORMATaudio = file_path + "audio"if not os.path.isdir(audio):os.mkdir(audio)audio_path = f'{audio}/' + save_filewith open(audio_path, 'wb') as of:of.write(result_str)return audio_pathelse:return False

当然了，这个设计也是热拔插的，以后这些数据都会做成动态的，在页面用户可以调整，也可以选择其他的API服务商

第四步、调用百度语音合成包进行语音合成

这里就比较麻烦了，首先要搭建起 Stable Diffusion 的环境，Window 用户我记得有一个绘世
的软件，一键就可以安装，mac用户要去官网下载。

class Main:sd_url = sd_urldef draw_picture(self, obj_list):""":param obj_list::return: 图片地址列表"""picture_path_list = []for index, obj in enumerate(obj_list):novel_dict = {"enable_hr": "false","denoising_strength": 0,"firstphase_width": 0,"firstphase_height": 0,"hr_scale": 2,"hr_upscaler": "string","hr_second_pass_steps": 0,"hr_resize_x": 0,"hr_resize_y": 0,"prompt": "{}".format(obj["prompt"]),"styles": ["string"],"seed": -1,"subseed": -1,"subseed_strength": 0,"seed_resize_from_h": -1,"seed_resize_from_w": -1,"sampler_name": "DPM++ SDE Karras","batch_size": 1,"n_iter": 1,"steps": 50,"cfg_scale": 7,"width": 1024,"height": 768,"restore_faces": "false","tiling": "false","do_not_save_samples": "false","do_not_save_grid": "false","negative_prompt": obj["negative"],"eta": 0,"s_churn": 0,"s_tmax": 0,"s_tmin": 0,"s_noise": 1,"override_settings": {},"override_settings_restore_afterwards": "true","script_args": [],"sampler_index": "DPM++ SDE Karras","script_name": "","send_images": "true","save_images": "true","alwayson_scripts": {}}html = requests.post(self.sd_url, data=json.dumps(novel_dict))img_response = json.loads(html.text)image_bytes = base64.b64decode(img_response['images'][0])image = Image.open(io.BytesIO(image_bytes))# 图片存放new_path = file_path + 'picture'if not os.path.exists(new_path):os.makedirs(new_path)picture_name = str(obj['index']) + ".png"image_path = os.path.join(new_path, picture_name)image.save(image_path)picture_path_list.append(image_path)print(f"-----------生成第{index}张图片-----------")return picture_path_list

后期我看看能不能引入 Midjuorney 的服务商，或者他们官方的API ps ~ 做人没有梦想和咸鱼有什么区别🥳

第五步、使用moviepy将图片和语音结合起来生成视频

moviepy中文文档

import os
from moviepy.editor import ImageSequenceClip, AudioFileClip, concatenate_videoclips
import numpy as npfrom config import file_pathclass Main:def merge_video(self, picture_path_list: list, audio_path_list: list, name: str):""":param picture_path_list: 图片路径列表:param audio_path_list: 音频路径列表:return:"""clips = []for index, value in enumerate(picture_path_list):audio_clip = AudioFileClip(audio_path_list[index])img_clip = ImageSequenceClip([picture_path_list[index]], audio_clip.duration)img_clip = img_clip.set_position(('center', 'center')).fl(self.fl_up, apply_to=['mask']).set_duration(audio_clip.duration)clip = img_clip.set_audio(audio_clip)clips.append(clip)print(f"-----------生成第{index}段视频-----------")print(f"-----------开始合成视频-----------")final_clip = concatenate_videoclips(clips)new_parent = file_path + "video/"if not os.path.exists(new_parent):os.makedirs(new_parent)final_clip.write_videofile(new_parent + name + ".mp4", fps=24, audio_codec="aac")def fl_up(self, gf, t):# 获取原始图像帧frame = gf(t)# 进行滚动效果，将图像向下滚动50像素height, width = frame.shape[:2]scroll_y = int(t * 10)  # 根据时间t计算滚动的像素数new_frame = np.zeros_like(frame)# 控制滚动的范围，避免滚动超出图像的边界if scroll_y < height:new_frame[:height - scroll_y, :] = frame[scroll_y:, :]return new_frame