当前位置: 首页 > news >正文

hugging face inference API返回内容太短的问题

hugging face的inference api返回的内容默认很短,可以通过参数max_new_tokens进行设置:

Detailed parameters

When sending your request, you should send a JSON encoded payload. Here are all the options

All parameters
inputs (required):a string to be generated from
parametersdict containing the following keys:
top_k(Default: None). Integer to define the top tokens considered within the sample operation to create new text.
top_p(Default: None). Float to define the tokens that are within the sample operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than top_p.
temperature(Default: 1.0). Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability.
repetition_penalty(Default: None). Float (0.0-100.0). The more a token is used within generation the more it is penalized to not be picked in successive generation passes.
max_new_tokens(Default: None). Int (0-250). The amount of new tokens to be generated, this does not include the input length it is a estimate of the size of generated text you want. Each new tokens slows down the request, so look for balance between response times and length of text generated.
max_time(Default: None). Float (0-120.0). The amount of time in seconds that the query should take maximum. Network can cause some overhead so it will be a soft limit. Use that in combination with max_new_tokens for best results.
return_full_text(Default: True). Bool. If set to False, the return results will not contain the original query making it easier for prompting.
num_return_sequences(Default: 1). Integer. The number of proposition you want to be returned.
do_sample(Optional: True). Bool. Whether or not to use sampling, use greedy decoding otherwise.
optionsa dict containing the following keys:
use_cache(Default: true). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query.
wait_for_model(Default: false) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places.

python示例:

import requestsAPI_URL = "https://api-inference.huggingface.co/models/xxxxxxxxx"
headers = {"Authorization": "Bearer xxxxxxxxxxx"}def query(payload):response = requests.post(API_URL, headers=headers, json=payload)return response.json()output = query({"inputs": "please write a LRU cache in C++ ","parameters": {"max_new_tokens": 250},
})print(output)

http://www.lryc.cn/news/150486.html

相关文章:

  • react中redux的详细使用以及持久化处理
  • 论文笔记: 循环神经网络进行速度模型反演 (未完)
  • 多维时序 | Matlab实现BiLSTM-Adaboost和BiLSTM多变量时间序列预测对比
  • excel绘制直方图
  • react-grid-layout 实现原理介绍
  • 集合框架-(Collection/Map)
  • 什么是单文件组件?
  • 国际站阿里云服务器多久会重启一次系统??
  • 低成本32位单片机电动工具无感方波控制方案
  • 安防视频监控/视频集中存储/云存储平台EasyCVR平台无法播放HLS协议该如何解决?
  • MySQL如何查找某个字段值相同的数据
  • 8.react18并发模式与startTransition(搜索高亮思路)
  • 前端Vue自定义得分构成水平柱形图组件 可用于系统专业门类得分评估分析
  • Linux获取纳秒级别时间
  • CSS中你不得不知道的盒子模型
  • 知识储备--基础算法篇-数组
  • zookeeper 理论合集
  • 【pyinstaller 怎么打包python,打包后程序闪退 不打日志 找不到自建模块等问题的踩坑解决】
  • 【Docker】网络
  • Linux :realpath 命令
  • react17:生命周期函数
  • 腾讯内部单边拥塞算法BBR-TCPA一键脚本安装
  • 【LLM】chatglm-6B模型训练和推理
  • 性能可靠it监控系统,性能监控软件的获得来源有哪些
  • TCP/IP网络编程(一) 理解网络编程和套接字
  • Python 潮流周刊#18:Flask、Streamlit、Polars 的学习教程
  • 装备一台ubuntu
  • 为了更好和大家交流,欢迎大家加我的微信账户
  • MS1826A HDMI 多功能视频处理器 HDMI4进1出画面分割芯片
  • 最新文献怎么找|学术最新前沿文献哪里找