当前位置: 首页 > news >正文

llama.cpp部署(windows)

一、下载源码和模型

 下载源码和模型
# 下载源码
git clone https://github.com/ggerganov/llama.cpp.git# 下载llama-7b模型
git clone https://www.modelscope.cn/skyline2006/llama-7b.git
 查看cmake版本:
D:\pyworkspace\llama_cpp\llama.cpp\build>cmake --version
cmake version 3.22.0-rc2CMake suite maintained and supported by Kitware (kitware.com/cmake).

 二、开始build

# 进入llama.cpp目录
mkdir build
cd build
cmake ..

build信息 

D:\pyworkspace\llama_cpp\llama.cpp\build>cmake ..
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.22631.
-- The C compiler identification is MSVC 19.29.30137.0
-- The CXX compiler identification is MSVC 19.29.30137.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: D:/Git/Git/cmd/git.exe (found version "2.29.2.windows.2")
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- CMAKE_GENERATOR_PLATFORM:
-- x86 detected
-- Performing Test HAS_AVX_1
-- Performing Test HAS_AVX_1 - Success
-- Performing Test HAS_AVX2_1
-- Performing Test HAS_AVX2_1 - Success
-- Performing Test HAS_FMA_1
-- Performing Test HAS_FMA_1 - Success
-- Performing Test HAS_AVX512_1
-- Performing Test HAS_AVX512_1 - Failed
-- Performing Test HAS_AVX512_2
-- Performing Test HAS_AVX512_2 - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: D:/pyworkspace/llama_cpp/llama.cpp/build

 本地使用Realease会出现报错,修改为Debug进行build,这里会使用到visual studio进行build

cmake --build . --config Debug

 build信息

D:\pyworkspace\llama_cpp\llama.cpp\build>cmake --build . --config Debug
用于 .NET Framework 的 Microsoft (R) 生成引擎版本 16.11.2+f32259642
版权所有(C) Microsoft Corporation。保留所有权利。Checking Build SystemGenerating build details from Git-- Found Git: D:/Git/Git/cmd/git.exe (found version "2.29.2.windows.2")Building Custom Rule D:/pyworkspace/llama_cpp/llama.cpp/common/CMakeLists.txtbuild-info.cppbuild_info.vcxproj -> D:\pyworkspace\llama_cpp\llama.cpp\build\common\build_info.dir\Debug\build_info.libBuilding Custom Rule D:/pyworkspace/llama_cpp/llama.cpp/CMakeLists.txtggml.c

 在我本地D:\pyworkspace\llama_cpp\llama.cpp\build\bin\Debug目录下面产生了quantize.exe和main.exe等

 三、量化和推理

安装相关python依赖

python -m pip install -r requirements.txt

将下载好的llama-7b模型放入models目录下,并执行命令,会在llama-7b目录下面产生ggml-model-f16.gguf文件

python convert.py models/llama-7b/

对产生的文件进行量化

D:\pyworkspace\llama_cpp\llama.cpp\build\bin\Debug\quantize.exe ./models/llama-7b/ggml-model-f16.gguf ./models/llama-7b/ggml-model-q4_0.gguf q4_0

进行推理

D:\pyworkspace\llama_cpp\llama.cpp\build\bin\Debug\main.exe -m ./models/llama-7b/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

http://www.lryc.cn/news/256640.html

相关文章:

  • STM32CubeMX+micro_ros_stm32cubemx_utils库
  • C语言有哪些预处理操作?
  • 数据结构算法-希尔排序算法
  • php使用vue.js实现省市区三级联动
  • 软件测试:测试用例八大要素模板
  • C语言进阶之路之顶峰相见篇
  • 第76讲:MySQL数据库中常用的命令行工具的基本使用
  • 初级数据结构(二)——链表
  • Kubernetes架构及核心部件
  • RAW和YUV的区别
  • Linux常见问题-获取日志方法总结(Ubuntu/Debian)
  • 【机器视觉技术栈】03 - 镜头
  • 判断一个Series序列的值是否为单调递减Series.is_monotonic_decreasing
  • CSPNet: A New Backbone that can Enhance Learning Capability of CNN(2019)
  • 本科毕业论文查重的依据
  • 如何利用Axure制作移动端产品原型
  • Java中时间之间的转换
  • 【win32_005】调试信息打印到控制台----2种简单方法
  • PPT添加备注
  • Ubuntu20.04使用cephadm部署ceph集群
  • 激光打标机在智能手表上的应用:科技与时尚的完美结合
  • ROS-ROS通信机制-参数服务器
  • 在github中通过action自动化部署 hugo academic theme,实现上传md文件更新博客内容
  • 深入理解asyncio:异步编程的基础用法
  • Android 消息分发机制解读
  • 【ML】LSTM应用——预测股票(基于 tensorflow2)
  • 汇编语言程序设计实验报告
  • 广域网(WAN)设备通信过程(通信流程、通信步骤、通信顺序、设备通信、主机通信)(MAC地址在本地链路中的作用)跳跃(hop)
  • ExoPlayer架构详解与源码分析(10)——H264Reader
  • 智能优化算法应用:基于粒子群算法3D无线传感器网络(WSN)覆盖优化 - 附代码