当前位置：首页 > news >正文

ChatGLM2-6B 部署

news 2025/7/21 17:35:38

本文主要对 ChatGLM2-6B 模型的部署和推理过程进行介绍。

一、部署环境

在阿里云服务器上部署，具体环境如下：

modelscope:1.9.5

pytorch 2.0.1

tensorflow 2.13.0

python 3.8

cuda 118

ubuntu 20.04

CPU 8 core

内存 30 GiB

GPU NVIDIA A10 24GB

二、部署步骤

（1）下载 ChatGLM2-6B 运行代码。

git clone https://github.com/THUDM/ChatGLM2-6B.git

（2) 安装依赖环境

进入 ChatGLM2-6B 目录，执行如下命令安装依赖。

pip install -r requirements.txt

（3）修改 cli_demo.py

直接运行会出现如下错误。

ChatGLM：2024-06-20 22:18:27.454216: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-20 22:18:27.914578: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-20 22:18:29.304992: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

在 cli_demo.py 加入如下代码。

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

（4）下载 ChatGLM2-6B 的模型文件

模型下载可以从 Hugging Face 下载，也可以从魔搭社区下载，魔搭社区下载更快，魔搭社区下载如下。

#如果直接安装git-lfs报错，则手动安装，取消注释，ubuntu环境
#curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
apt-get install git-lfs
git clone https://www.modelscope.cn/ZhipuAI/chatglm2-6b.git

（5）将模型文件放置到运行代码目录下 THUDM/chatglm2-6b

root@dsw-396000-594d59f669-ph78p:/mnt/workspace/ChatGLM2-6B/THUDM/chatglm2-6b# ls
config.json               configuration.json   MODEL_LICENSE                     pytorch_model-00002-of-00007.bin  pytorch_model-00004-of-00007.bin  pytorch_model-00006-of-00007.bin  pytorch_model.bin.index.json  quickstart.md  tokenization_chatglm.py  tokenizer.model
configuration_chatglm.py  modeling_chatglm.py  pytorch_model-00001-of-00007.bin  pytorch_model-00003-of-00007.bin  pytorch_model-00005-of-00007.bin  pytorch_model-00007-of-00007.bin  quantization.py               README.md      tokenizer_config.json

三、模型推理

进入ChatGLM2-6B目录，执行如下命令终端运行。

root@dsw-396000-594d59f669-ph78p:/mnt/workspace/ChatGLM2-6B# python cli_demo.py 
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.12s/it]
欢迎使用 ChatGLM2-6B 模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序用户：

参考链接：

[1] 【已解决】oneDNN custom operations are on. You may see slightly different numerical-CSDN博客

查看全文

http://www.lryc.cn/news/378151.html