当前位置: 首页 > news >正文

LLMs之Embedding:Qwen3 Embedding的简介、安装和使用方法、案例应用之详细攻略

LLMs之Embedding:Qwen3 Embedding的简介、安装和使用方法、案例应用之详细攻略

目录

Qwen3 Embedding的简介

1、特点

2、模型列表

3、评测结果

MTEB (Multilingual)

MTEB (Eng v2)

C-MTEB (MTEB Chinese)

Reranker

Qwen3 Embedding的使用方法

1、安装

2、使用方法

2.1、Text Embedding嵌入模型的使用方法

Transformers 使用方法

vLLM 使用方法

Sentence Transformers 使用方法

2.2、Text Reranking重排序模型的使用方法

Transformers 使用方法

vLLM 使用方法

案例应用


Qwen3 Embedding的简介

2025年6月,Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型,专门为文本嵌入排序任务而设计。它构建于 Qwen3 系列的密集基础模型之上,提供各种尺寸(0.6B、4B 和 8B)的文本嵌入和重排序模型。该系列继承了其基础模型的卓越多语言能力、长文本理解和推理能力。Qwen3 Embedding 系列代表了在多项文本嵌入和排序任务(包括文本检索、代码检索、文本分类、文本聚类和双文本挖掘)方面的重大进步。

GitHub地址:https://github.com/QwenLM/Qwen3-Embedding

1、特点

>> 卓越的通用性:该嵌入模型在各种下游应用评估中均实现了最先进的性能。截至 2025 年 6 月 5 日,8B 尺寸的嵌入模型在 MTEB 多语言排行榜中排名第一(得分 70.58),而重排序模型在各种文本检索场景中表现出色。

>> 全面的灵活性:Qwen3 Embedding 系列为嵌入和重排序模型提供全方位的尺寸(从 0.6B 到 8B),以满足优先考虑效率和有效性的各种用例。开发人员可以无缝地组合这两个模块。此外,嵌入模型允许跨所有维度灵活定义向量,并且嵌入和重排序模型都支持用户定义的指令,以增强特定任务、语言或场景的性能。

>> 多语言能力:借助 Qwen3 模型的多语言能力,Qwen3 Embedding 系列提供对 100 多种语言的支持。这包括各种编程语言,并提供强大的多语言、跨语言和代码检索能力。

>> MRL 支持:MRL(Matryoshka Representation Learning)支持表示嵌入模型是否支持最终嵌入的自定义维度。

>> 指令感知: 指令感知说明嵌入或重排序模型是否支持根据不同的任务自定义输入指令。

2、模型列表

Model TypeModelsSizeLayersSequence LengthEmbedding DimensionMRL SupportInstruction Aware
Text EmbeddingQwen3-Embedding-0.6B0.6B2832K1024YesYes
Text EmbeddingQwen3-Embedding-4B4B3632K2560YesYes
Text EmbeddingQwen3-Embedding-8B8B3632K4096YesYes
Text RerankingQwen3-Reranker-0.6B0.6B2832K--Yes
Text RerankingQwen3-Reranker-4B4B3632K--Yes
Text RerankingQwen3-Reranker-8B8B3632K--Yes

3、评测结果

Qwen3 Embedding 系列模型在 MTEB 多语言排行榜上取得了优异的成绩。具体结果请参考文档中的表格。

MTEB (Multilingual)

ModelSizeMean (Task)Mean (Type)Bitxt MiningClass.Clust.Inst. Retri.Multi. Class.Pair. Class.RerankRetri.STS
NV-Embed-v27B56.2949.5857.8457.2940.801.0418.6378.9463.8256.7271.10
GritLM-7B7B60.9253.7470.5361.8349.753.4522.7779.9463.7858.3173.33
BGE-M30.6B59.5652.1879.1160.3540.88-3.1120.180.7662.7954.6074.12
multilingual-e5-large-instruct0.6B63.2255.0880.1364.9450.75-0.4022.9180.8662.6157.1276.81
gte-Qwen2-1.5B-instruct1.5B59.4552.6962.5158.3252.050.7424.0281.5862.5860.7871.61
gte-Qwen2-7b-Instruct7B62.5155.9373.9261.5552.774.9425.4885.1365.5560.0873.98
text-embedding-3-large-58.9351.4162.1760.2746.89-2.6822.0379.1763.8959.2771.68
Cohere-embed-multilingual-v3.0-61.1253.2370.5062.9546.89-1.8922.7479.8864.0759.1674.80
gemini-embedding-exp-03-07-68.3759.5979.2871.8254.595.1829.1683.6365.5867.7179.40
Qwen3-Embedding-0.6B0.6B64.3356.0072.2266.8352.335.0924.5980.8361.4164.6476.17
Qwen3-Embedding-4B4B69.4560.8679.3672.3357.1511.5626.7785.0565.0869.6080.86
Qwen3-Embedding-8B8B70.5861.6980.8974.0057.6510.0628.6686.4065.6370.8881.08

MTEB (Eng v2)

MTEB English / ModelsParam.Mean(Task)Mean(Type)Class.Clust.Pair Class.Rerank.Retri.STSSumm.
multilingual-e5-large-instruct0.6B65.5361.2175.5449.8986.2448.7453.4784.7229.89
NV-Embed-v27.8B69.8165.0087.1947.6688.6949.6162.8483.8235.21
GritLM-7B7.2B67.0763.2281.2550.8287.2949.5954.9583.0335.65
gte-Qwen2-1.5B-instruct1.5B67.2063.2685.8453.5487.5249.2550.2582.5133.94
stella_en_1.5B_v51.5B69.4365.3289.3857.0688.0250.1952.4283.2736.91
gte-Qwen2-7B-instruct7.6B70.7265.7788.5258.9785.950.4758.0982.6935.74
gemini-embedding-exp-03-07-73.367.6790.0559.3987.748.5964.3585.2938.28
Qwen3-Embedding-0.6B0.6B70.7064.8885.7654.0584.3748.1861.8386.5733.43
Qwen3-Embedding-4B4B74.6068.1089.8457.5187.0150.7668.4688.7234.39
Qwen3-Embedding-8B8B75.2268.7190.4358.5787.5251.5669.4488.5834.83

C-MTEB (MTEB Chinese)

C-MTEBParam.Mean(Task)Mean(Type)Class.Clust.Pair Class.Rerank.Retr.STS
multilingual-e5-large-instruct0.6B58.0858.2469.8048.2364.5257.4563.6545.81
bge-multilingual-gemma29B67.6468.5275.3159.3086.6768.2873.7355.19
gte-Qwen2-1.5B-instruct1.5B67.1267.7972.5354.6179.568.2171.8660.05
gte-Qwen2-7B-instruct7.6B71.6272.1975.7766.0681.1669.2475.7065.20
ritrieve_zh_v10.3B72.7173.8576.8866.585.9872.8676.9763.92
Qwen3-Embedding-0.6B0.6B66.3367.4571.4068.7476.4262.5871.0354.52
Qwen3-Embedding-4B4B72.2773.5175.4677.8983.3466.0577.0361.26
Qwen3-Embedding-8B8B73.8475.0076.9780.0884.2366.9978.2163.53

Reranker

ModelParamMTEB-RCMTEB-RMMTEB-RMLDRMTEB-CodeFollowIR
Qwen3-Embedding-0.6B0.6B61.8271.0264.6450.2675.415.09
Jina-multilingual-reranker-v2-base0.3B58.2263.3763.7339.6658.98-0.68
gte-multilingual-reranker-base0.3B59.5174.0859.4466.3354.18-1.64
BGE-reranker-v2-m30.6B57.0372.1658.3659.5141.38-0.01
Qwen3-Reranker-0.6B0.6B65.8071.3166.3667.2873.425.41
Qwen3-Reranker-4B4B69.7675.9472.7469.9781.2014.84
Qwen3-Reranker-8B8B69.0277.4572.9470.1981.228.05

Qwen3 Embedding的使用方法

1、安装

下载地址:https://huggingface.co/collections/Qwen/qwen3-embedding-6841b2055b99c44d9a4c371f

2、使用方法

以下是使用 transformers 库加载和使用 Qwen3-Embedding 模型的示例代码:

2.1、Text Embedding嵌入模型的使用方法

Transformers 使用方法

# Requires transformers>=4.51.0

# Requires transformers>=4.51.0import torch
import torch.nn.functional as Ffrom torch import Tensor
from transformers import AutoTokenizer, AutoModeldef last_token_pool(last_hidden_states: Tensor,attention_mask: Tensor) -> Tensor:left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])if left_padding:return last_hidden_states[:, -1]else:sequence_lengths = attention_mask.sum(dim=1) - 1batch_size = last_hidden_states.shape[0]return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]def get_detailed_instruct(task_description: str, query: str) -> str:return f'Instruct: {task_description}\nQuery:{query}'# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'queries = [get_detailed_instruct(task, 'What is the capital of China?'),get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documentstokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()max_length = 8192# Tokenize the input texts
batch_dict = tokenizer(input_texts,padding=True,truncation=True,max_length=max_length,return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]
vLLM 使用方法

# Requires vllm>=0.8.5

# Requires vllm>=0.8.5
import torch
import vllm
from vllm import LLMdef get_detailed_instruct(task_description: str, query: str) -> str:return f'Instruct: {task_description}\nQuery:{query}'# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'queries = [get_detailed_instruct(task, 'What is the capital of China?'),get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documentsmodel = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")outputs = model.embed(input_texts)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]

Sentence Transformers 使用方法
# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0from sentence_transformers import SentenceTransformer# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )# The queries and documents to embed
queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7646, 0.1414], [0.1355, 0.6000]])

2.2、Text Reranking重排序模型的使用方法

Transformers 使用方法

# Requires transformers>=4.51.0

# Requires transformers>=4.51.0
import torch
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLMdef format_instruction(instruction, query, doc):if instruction is None:instruction = 'Given a web search query, retrieve relevant passages that answer the query'output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)return outputdef process_inputs(pairs):inputs = tokenizer(pairs, padding=False, truncation='longest_first',return_attention_mask=False, max_length=max_length - len(prefix_tokens) - len(suffix_tokens))for i, ele in enumerate(inputs['input_ids']):inputs['input_ids'][i] = prefix_tokens + ele + suffix_tokensinputs = tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=max_length)for key in inputs:inputs[key] = inputs[key].to(model.device)return inputs@torch.no_grad()
def compute_logits(inputs, **kwargs):batch_scores = model(**inputs).logits[:, -1, :]true_vector = batch_scores[:, token_true_id]false_vector = batch_scores[:, token_false_id]batch_scores = torch.stack([false_vector, true_vector], dim=1)batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)scores = batch_scores[:, 1].exp().tolist()return scorestokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B").eval()# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B", torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda().eval()token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")
max_length = 8192prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
prefix_tokens = tokenizer.encode(prefix, add_special_tokens=False)
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)task = 'Given a web search query, retrieve relevant passages that answer the query'queries = ["What is the capital of China?","Explain gravity",
]documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]# Tokenize the input texts
inputs = process_inputs(pairs)
scores = compute_logits(inputs)print("scores: ", scores)

vLLM 使用方法
# Requires vllm>=0.8.5
import logging
from typing import Dict, Optional, Listimport json
import loggingimport torchfrom transformers import AutoTokenizer, is_torch_npu_available
from vllm import LLM, SamplingParams
from vllm.distributed.parallel_state import destroy_model_parallel
import gc
import math
from vllm.inputs.data import TokensPromptdef format_instruction(instruction, query, doc):text = [{"role": "system", "content": "Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\"."},{"role": "user", "content": f"<Instruct>: {instruction}\n\n<Query>: {query}\n\n<Document>: {doc}"}]return textdef process_inputs(pairs, instruction, max_length, suffix_tokens):messages = [format_instruction(instruction, query, doc) for query, doc in pairs]messages =  tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=False, enable_thinking=False)messages = [ele[:max_length] + suffix_tokens for ele in messages]messages = [TokensPrompt(prompt_token_ids=ele) for ele in messages]return messagesdef compute_logits(model, messages, sampling_params, true_token, false_token):outputs = model.generate(messages, sampling_params, use_tqdm=False)scores = []for i in range(len(outputs)):final_logits = outputs[i].outputs[0].logprobs[-1]token_count = len(outputs[i].outputs[0].token_ids)if true_token not in final_logits:true_logit = -10else:true_logit = final_logits[true_token].logprobif false_token not in final_logits:false_logit = -10else:false_logit = final_logits[false_token].logprobtrue_score = math.exp(true_logit)false_score = math.exp(false_logit)score = true_score / (true_score + false_score)scores.append(score)return scoresnumber_of_gpu = torch.cuda.device_count()
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Reranking-4B')
model = LLM(model='Qwen/Qwen3-Reranking-0.6B', tensor_parallel_size=number_of_gpu, max_model_len=10000, enable_prefix_caching=True, gpu_memory_utilization=0.8)
tokenizer.padding_side = "left"
tokenizer.pad_token = tokenizer.eos_token
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
max_length=8192
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)
true_token = tokenizer("yes", add_special_tokens=False).input_ids[0]
false_token = tokenizer("no", add_special_tokens=False).input_ids[0]
sampling_params = SamplingParams(temperature=0, max_tokens=1,logprobs=20, allowed_token_ids=[true_token, false_token],
)task = 'Given a web search query, retrieve relevant passages that answer the query'
queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]pairs = list(zip(queries, documents))
inputs = process_inputs(pairs, task, max_length-len(suffix_tokens), suffix_tokens)
scores = compute_logits(model, inputs, sampling_params, true_token, false_token)
print('scores', scores)destroy_model_parallel()

案例应用

Qwen3 Embedding 系列模型适用于各种文本嵌入和排序任务,包括:
>> 文本检索:根据查询检索相关文档。
>> 代码检索:根据查询检索相关代码片段。
>> 文本分类:将文本分类到不同的类别。
>> 文本聚类:将相似的文本聚集在一起。
>> 双文本挖掘:在两种不同的文本中查找对应的关系。

http://www.lryc.cn/news/573811.html

相关文章:

  • ms-swift 部分命令行参数说明
  • OpenGL ES 中的材质
  • AI 辅助生成 Mermaid 流程图
  • 华为公布《鸿蒙编程语言白皮书》V1.0 版:解读适用场景
  • Android14 app被冻结导致进程间通信失败
  • 电脑商城--购物车
  • 【C++】C++枚举、const、static的用法
  • Python炫酷星空
  • 前端的跨域问题
  • Java Stream流
  • 量化-因子处理
  • 飞轮储能辅助双馈风机参与电力系统一次调频的仿真模型研究
  • 半导体二极管
  • work遇到的状态码
  • 跨平台开发flutter初体验
  • 华为云Flexus+DeepSeek征文|华为云 Dify 高可用部署教程:CCE 容器集群一键构建企业级智能应用
  • 登录拦截功能实现 -瑞吉外卖笔记
  • Windows 后渗透中可能会遇到的加密字符串分析
  • 等等等等等等
  • PostgreSQL全栈部署指南:从零构建企业级高可用数据库集群
  • pyqt f-string
  • jenkins对接、jenkins-rest
  • LocalAI: OpenAI、Claude 等的开源替代
  • 3、NLP黄金九步法(问题定义-数据获取-数据探索)
  • Flink Connector Kafka深度剖析与进阶实践指南
  • js 函数参数赋值问题
  • 【Android】am命令
  • ROS 2 中 Astra Pro 相机与 YOLOv5 检测功能编译启动全记录
  • Oracle 数据库查询:单表查询
  • 华为云Flexus+DeepSeek征文|开启DeepSeek-V3+R1商用服务之旅