当前位置：首页 > news >正文

【Finetune】（一）、transformers之BitFit微调

news 2025/7/29 7:14:27

文章目录

0、参数微调简介
1、常见的微调方法
2、代码实战
- 2.1、导包
- 2.2、加载数据集
- 2.3、数据集处理
- 2.4、创建模型
- 2.5、BitFit微调*
- 2.6、配置模型参数
- 2.7、创建训练器
- 2.8、模型训练
- 2.9、模型推理

0、参数微调简介

参数微调方法是仅对模型的一小部分的参数（这一小部分可能是模型自身的，也可能是外部引入的）进行训练，便可以为模型带来显著的性能变化，在一些场景下甚至不输于全量微调。
由于训练一小部分参数，极大程度降低了训练大模型的算力需求，不需要多机多卡，单卡就可以完成对一些大模型的训练。不仅如此，少量的训练参数，对存储的要求同样降低很多，大多数的参数微调方法只需要保存训练部分的参数，与动辄几十GB的原始大模型相比，几乎可以忽略。

1、常见的微调方法

常见的微调方法如图所示：
在这里插入图片描述

Lialin, Vladislav, Vijeta Deshpande, and Anna Rumshisky. “Scaling down to scale up: A guide to parameter-efficient fine-tuning.” arXiv preprint arXiv:2303.15647 (2023).

2、代码实战

模型——bloom-389m-zh
数据集——alpaca_data_zh

2.1、导包

from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer

2.2、加载数据集

ds = Dataset.load_from_disk("./alpaca_data_zh/")

2.3、数据集处理

tokenizer = AutoTokenizer.from_pretrained("../Model/bloom-389m-zh")
tokenizer

def process_func(example):MAX_LENGTH = 256input_ids, attention_mask, labels = [], [], []instruction = tokenizer("\n".join(["Human: " + example["instruction"], example["input"]]).strip() + "\n\nAssistant: ")response = tokenizer(example["output"] + tokenizer.eos_token)input_ids = instruction["input_ids"] + response["input_ids"]attention_mask = instruction["attention_mask"] + response["attention_mask"]labels = [-100] * len(instruction["input_ids"]) + response["input_ids"]if len(input_ids) > MAX_LENGTH:input_ids = input_ids[:MAX_LENGTH]attention_mask = attention_mask[:MAX_LENGTH]labels = labels[:MAX_LENGTH]return {"input_ids": input_ids,"attention_mask": attention_mask,"labels": labels}

tokenized_ds = ds.map(process_func, remove_columns=ds.column_names)
tokenized_ds

2.4、创建模型

model = AutoModelForCausalLM.from_pretrained("../Model/bloom-389m-zh",low_cpu_mem_usage=True)

2.5、BitFit微调*

#选择模型参数里面的所有bias部分
#非bias部分冻结
num_param = 0
for name,param in model.named_parameters():if 'bias' not in name:param.requires_grad = Falseelse:num_param+=param.numel()
num_param

2.6、配置模型参数

args = TrainingArguments(output_dir="./chatbot",per_device_train_batch_size=1,gradient_accumulation_steps=4,logging_steps=10,num_train_epochs=1
)

2.7、创建训练器

trainer = Trainer(args=args,model=model,train_dataset=tokenized_ds,data_collator=DataCollatorForSeq2Seq(tokenizer, padding=True, )
)

2.8、模型训练

trainer.train()

2.9、模型推理

from transformers import pipelinepipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)

ipt = "Human: {}\n{}".format("考试有哪些技巧？", "").strip() + "\n\nAssistant: "
pipe(ipt, max_length=256, do_sample=True, temperature=0.5)

查看全文

http://www.lryc.cn/news/441077.html

ubuntu24系统普通用户免密切换到root用户

如何应对pcdn技术中遇到的网络安全问题?

【WRF工具】WRF Domain Wizard第一期：软件下载及安装

使用CUBE_MX实现STM32 DMA功能（储存器发送数据到外设串口）+（外设串口将数据写入到存储器）

【JavaScript】数据结构之树

【AI大模型】LLM主流开源大模型介绍

Uniapp的alertDialog返回值+async/await处理确定/取消问题

Spring Boot中的响应与分层解耦架构

基于python+django+vue的图书管理系统

Oracle数据库安装与SQL*Plus使用

C#通过MXComponent与三菱PLC通信

深度学习实战91-利用时空特征融合模型的城市网络流量预测分析与应用

GlusterFS 分布式文件系统

论文学习笔记6：Relation-Aware Heterogeneous Graph Neural Network for Fraud Detection

无人机光电吊舱的技术！！

C++——判断year是不是闰年。

31. 三维向量Vector3与模型位置

C# Action和delegate区别及示例代码

WhaleStudio 与飞腾 S5000C 处理器完成产品兼容测试！

【Arduino】Arduino使用USB-TTL无法下载程序问题

使用源代码编译R包的过程

基于JavaWeb开发的java springboot+mybatis电影售票网站管理系统前台+后台设计和实现

【论文阅读】Face2Diffusion for Fast and Editable Face Personalization

金钥匙系列：Kubernetes (K8s) 服务集群技术栈学习路线

centos远程桌面连接windows