当前位置: 首页 > news >正文

【Transformers基础入门篇2】基础组件之Pipeline

文章目录

  • 一、什么是Pipeline
  • 二、查看PipeLine支持的任务类型
  • 三、Pipeline的创建和使用
    • 3.1 根据任务类型,直接创建Pipeline,默认是英文模型
    • 3.2 指定任务类型,再指定模型,创建基于指定模型的Pipeline
    • 3.3 预先加载模型,再创建Pipeline
    • 3.4 使用Gpu进行推理
    • 3.5 查看Device
    • 3.6 测试一下耗时
    • 3.7 确定的Pipeline的参数
  • 四、Pipeline的背后实现


本文为 https://space.bilibili.com/21060026/channel/collectiondetail?sid=1357748的视频学习笔记

项目地址为:https://github.com/zyds/transformers-code


一、什么是Pipeline

  • 将数据预处理、模型调用、结果后处理三部分组装成的流水线,如下流程图
  • 使我们能够直接输入文本便获得最终的答案,不需要我们关注细节
ToKenizer
Model
PostProcessing
Raw text
Input IDs
Logits
Predictions
我觉得不太行
101, 2769, 6230, 2533, 679, 1922, 6121, 8013, 102
0.9736, 0.0264
Positive:0.9736

二、查看PipeLine支持的任务类型

from transformers.pipelines import SUPPORTED_TASKS
from pprint import pprint
for k, v in SUPPORTED_TASKS.items():print(k, v)

输出但其概念PipeLine支持的任务类型以及可以调用的
举例输出:

audio-classification {'impl': <class 'transformers.pipelines.audio_classification.AudioClassificationPipeline'>, 'tf': (), 'pt': (<class 'transformers.models.auto.modeling_auto.AutoModelForAudioClassification'>,), 'default': {'model': {'pt': ('superb/wav2vec2-base-superb-ks', '372e048')}}, 'type': 'audio'}
  • key: 任务的名称,如音频分类
  • v:关于任务的实现,如具体哪个Pipeline,有没有TF模型,有没有pytorch模型, 模型具体是哪一个
    在这里插入图片描述

三、Pipeline的创建和使用

3.1 根据任务类型,直接创建Pipeline,默认是英文模型

from transformers import pipeline
pipe = pipeline("text-classification") # 根据pipeline直接创建一个任务类
pipe("very good") # 测试一个句子,输出结果

3.2 指定任务类型,再指定模型,创建基于指定模型的Pipeline

注,这里我已经将模型离线下载到本地了

# https://huggingface.co/models
pipe = pipeline("text-classification", model="./models/roberta-base-finetuned-dianping-chinese")

3.3 预先加载模型,再创建Pipeline

rom transformers import AutoModelForSequenceClassification, AutoTokenizer# 这种方式,必须同时指定model和tokenizer
model = AutoModelForSequenceClassification.from_pretrained("./models_roberta-base-finetuned-dianping-chinese")
tokenizer = AutoTokenizer.from_pretrained("./models_roberta-base-finetuned-dianping-chinese")
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

3.4 使用Gpu进行推理

pipe = pipeline("text-classification", model="./models_roberta-base-finetuned-dianping-chinese", device=0)

3.5 查看Device

pipe.model.device

3.6 测试一下耗时

import torch
import time
times = []
for i in range(100):torch.cuda.synchronize()start = time.time()pipe("我觉得不太行!")torch.cuda.synchronize()end = time.time()times.append(end - start)
print(sum(times) / 100)

3.7 确定的Pipeline的参数

# 先创建一个pipeline
qa_pipe = pipeline("question-answering", model="../../models/models")
qa_pipe

输出
在这里插入图片描述QuestionAnsweringPipeline
在这里插入图片描述
查看定义,会告诉我们这个pipeline该如何使用

class QuestionAnsweringPipeline(ChunkPipeline):"""Question Answering pipeline using any `ModelForQuestionAnswering`. See the [question answeringexamples](../task_summary#question-answering) for more information.Example:```python>>> from transformers import pipeline>>> oracle = pipeline(model="deepset/roberta-base-squad2")>>> oracle(question="Where do I live?", context="My name is Wolfgang and I live in Berlin"){'score': 0.9191, 'start': 34, 'end': 40, 'answer': 'Berlin'}```Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial)This question answering pipeline can currently be loaded from [`pipeline`] using the following task identifier:`"question-answering"`.The models that this pipeline can use are models that have been fine-tuned on a question answering task. See theup-to-date list of available models on[huggingface.co/models](https://huggingface.co/models?filter=question-answering)."""

进入pipeline,看__call__,查看可以支持的更多的参数
列出了更多的参数

    def __call__(self, *args, **kwargs):"""Answer the question(s) given as inputs by using the context(s).Args:args ([`SquadExample`] or a list of [`SquadExample`]):One or several [`SquadExample`] containing the question and context.X ([`SquadExample`] or a list of [`SquadExample`], *optional*):One or several [`SquadExample`] containing the question and context (will be treated the same way as ifpassed as the first positional argument).data ([`SquadExample`] or a list of [`SquadExample`], *optional*):One or several [`SquadExample`] containing the question and context (will be treated the same way as ifpassed as the first positional argument).question (`str` or `List[str]`):One or several question(s) (must be used in conjunction with the `context` argument).context (`str` or `List[str]`):One or several context(s) associated with the question(s) (must be used in conjunction with the`question` argument).topk (`int`, *optional*, defaults to 1):The number of answers to return (will be chosen by order of likelihood). Note that we return less thantopk answers if there are not enough options available within the context.doc_stride (`int`, *optional*, defaults to 128):If the context is too long to fit with the question for the model, it will be split in several chunkswith some overlap. This argument controls the size of that overlap.max_answer_len (`int`, *optional*, defaults to 15):The maximum length of predicted answers (e.g., only answers with a shorter length are considered).max_seq_len (`int`, *optional*, defaults to 384):The maximum length of the total sentence (context + question) in tokens of each chunk passed to themodel. The context will be split in several chunks (using `doc_stride` as overlap) if needed.max_question_len (`int`, *optional*, defaults to 64):The maximum length of the question after tokenization. It will be truncated if needed.handle_impossible_answer (`bool`, *optional*, defaults to `False`):Whether or not we accept impossible as an answer.align_to_words (`bool`, *optional*, defaults to `True`):Attempts to align the answer to real words. Improves quality on space separated langages. Might hurt onnon-space-separated languages (like Japanese or Chinese)Return:A `dict` or a list of `dict`: Each result comes as a dictionary with the following keys:- **score** (`float`) -- The probability associated to the answer.- **start** (`int`) -- The character start index of the answer (in the tokenized version of the input).- **end** (`int`) -- The character end index of the answer (in the tokenized version of the input).- **answer** (`str`) -- The answer to the question."""

如下面的例子

我们输出问题:中国的首都是哪里? 给的上下文是:中国的首都是北京

qa_pipe(question="中国的首都是哪里?", context="中国的首都是北京")

在这里插入图片描述

如果通过 max_answer_len参数来限定输出的最大长度,会进行强行截断

qa_pipe(question="中国的首都是哪里?", context="中国的首都是北京", max_answer_len=1)

在这里插入图片描述

四、Pipeline的背后实现

  • step1 初始化组件,Tokenizer,model
# step1 初始化tokenizer, model
tokenizer = AutoTokenizer.from_pretrained("../../models/models_roberta-base-finetuned-dianping-chinese")
model = AutoModelForSequenceClassification.from_pretrained("../../models/models_roberta-base-finetuned-dianping-chinese")
  • step2 预处理
# 预处理,返回pytorch的tensor,是一个dict
input_text = "我觉得不太行!"
inputs = tokenizer(input_text, return_tensors="pt")
inputs

在这里插入图片描述

  • step3 模型预测
res = model(**inputs)
res

在这里插入图片描述
预测的结果,包括的内容有点多,如loss,logits等

  • step4 结果后处理
logits = res.logits
logits = torch.softmax(logits, dim=-1)
pred = torch.argmax(logits).item()
result = model.config.id2label.get(pred)
result

在这里插入图片描述

http://www.lryc.cn/news/445459.html

相关文章:

  • java反射学习总结
  • 探索C语言与Linux编程:获取当前用户ID与进程ID
  • 1.4 边界值分析法
  • Spring IOC容器Bean对象管理-注解方式
  • OpenAI API: How to catch all 5xx errors in Python?
  • C++初阶学习——探索STL奥秘——标准库中的priority_queue与模拟实现
  • PyTorch经典模型
  • C++ STL容器(三) —— 迭代器底层剖析
  • 力扣416周赛
  • vue 页面常用图表框架
  • spring 注解 - @PostConstruct - 用于初始化工作
  • 多机器学习模型学习
  • 【网页设计】前言
  • STM32巡回研讨会总结(2024)
  • 54 螺旋矩阵
  • 基于STM32与OpenCV的物料搬运机械臂设计流程
  • [万字长文]stable diffusion代码阅读笔记
  • watchEffect工作原理
  • 斐波那契数列
  • TCP并发服务器的实现
  • 前端大屏自适应方案
  • 16.3 k8s容器cpu内存告警指标与资源request和limit
  • 【计算机网络 - 基础问题】每日 3 题(二十)
  • 铰链损失函数
  • 项目实战bug修复
  • Git常用指令整理【新手入门级】【by慕羽】
  • 记某学校小程序漏洞挖掘
  • 腾讯百度阿里华为常见算法面试题TOP100(3):链表、栈、特殊技巧
  • Apache CVE-2021-41773 漏洞复现
  • vue-入门速通