当前位置：首页 > news >正文

diffusers编写自己的推理管道

news 2025/7/1 5:16:01

英文文献：Stable Diffusion with 🧨 Diffusers

编写自己的推理管道

最后，我们展示了如何使用diffusers. 编写自定义推理管道是对diffusers库的高级使用，可用于切换某些组件，例如上面解释的 VAE 或调度程序。

例如，我们将展示如何将 Stable Diffusion 与不同的调度器一起使用，即本 PR中添加的 Katherine Crowson 的K-LMS 调度器。

预训练模型包括设置完整扩散管道所需的所有组件。它们存储在以下文件夹中：

text_encoder: Stable Diffusion 使用 CLIP，但其他扩散模型可能使用其他编码器，例如BERT.
tokenizer. 它必须与text_encoder模型使用的相匹配。
scheduler：用于在训练期间逐步向图像添加噪声的调度算法。
unet：用于生成输入的潜在表示的模型。
vae：自动编码器模块，我们将使用它来将潜在表示解码为真实图像。

我们可以通过引用保存组件的文件夹来加载组件，subfolder使用from_pretrained.

from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler# 1. Load the autoencoder model which will be used to decode the latents into image space. 
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")# 2. Load the tokenizer and text encoder to tokenize and encode the text. 
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")# 3. The UNet model for generating the latents.
unet = UNet2DConditionModel.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="unet")

现在我们不再加载预定义的调度程序，而是加载具有一些拟合参数的K-LMS 调度程序。

from diffusers import LMSDiscreteSchedulerscheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)

接下来，让我们将模型移动到 GPU。

torch_device = "cuda"
vae.to(torch_device)
text_encoder.to(torch_device)
unet.to(torch_device)

我们现在定义我们将用于生成图像的参数。

请注意，guidance_scale它的定义类似于Imagen 论文中等w式 (2)的指导权重。对应于不进行无分类器指导。在这里，我们将其设置为 7.5，就像之前所做的那样。guidance_scale == 1

与前面的示例相比，我们设置num_inference_steps为 100 以获得更清晰的图像。

prompt = ["a photograph of an astronaut riding a horse"]height = 512                        # default height of Stable Diffusion
width = 512                         # default width of Stable Diffusionnum_inference_steps = 100           # Number of denoising stepsguidance_scale = 7.5                # Scale for classifier-free guidancegenerator = torch.manual_seed(0)    # Seed generator to create the inital latent noisebatch_size = len(prompt)

首先，我们得到text_embeddings传递的提示。这些嵌入将用于调整 UNet 模型并引导图像生成类似于输入提示的内容。

text_input = tokenizer(prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt")text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]

查看全文

http://www.lryc.cn/news/1635.html