当前位置: 首页 > news >正文

diffusers库中stable Diffusion模块的解析

diffusers库中stable Diffusion模块的解析

diffusers中,stable Diffusion v1.5主要由以下几个部分组成

Out[3]: dict_keys(['vae', 'text_encoder', 'tokenizer', 'unet', 'scheduler', 'safety_checker', 'feature_extractor'])

下面给出具体的结构说明。

“text_encoder block”

CLIPTextModel((text_model): CLIPTextTransformer((embeddings): CLIPTextEmbeddings((token_embedding): Embedding(49408, 768)(position_embedding): Embedding(77, 768))(encoder): CLIPEncoder((layers): ModuleList((0-11): 12 x CLIPEncoderLayer((self_attn): CLIPAttention((k_proj): Linear(in_features=768, out_features=768, bias=True)(v_proj): Linear(in_features=768, out_features=768, bias=True)(q_proj): Linear(in_features=768, out_features=768, bias=True)(out_proj): Linear(in_features=768, out_features=768, bias=True))(layer_norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)(mlp): CLIPMLP((activation_fn): QuickGELUActivation()(fc1): Linear(in_features=768, out_features=3072, bias=True)(fc2): Linear(in_features=3072, out_features=768, bias=True))(layer_norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True))))(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True))
)

“vae block”

AutoencoderKL((encoder): Encoder((conv_in): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(down_blocks): ModuleList((0): DownEncoderBlock2D((resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 128, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 128, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(downsamplers): ModuleList((0): Downsample2D((conv): LoRACompatibleConv(128, 128, kernel_size=(3, 3), stride=(2, 2)))))(1): DownEncoderBlock2D((resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 128, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 256, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(128, 256, kernel_size=(1, 1), stride=(1, 1)))(1): ResnetBlock2D((norm1): GroupNorm(32, 256, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 256, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(downsamplers): ModuleList((0): Downsample2D((conv): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(2, 2)))))(2): DownEncoderBlock2D((resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 256, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 512, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(256, 512, kernel_size=(1, 1), stride=(1, 1)))(1): ResnetBlock2D((norm1): GroupNorm(32, 512, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 512, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(downsamplers): ModuleList((0): Downsample2D((conv): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(2, 2)))))(3): DownEncoderBlock2D((resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 512, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 512, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))))(mid_block): UNetMidBlock2D((attentions): ModuleList((0): Attention((group_norm): GroupNorm(32, 512, eps=1e-06, affine=True)(to_q): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(to_k): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(to_v): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(1): Dropout(p=0.0, inplace=False))))(resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 512, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 512, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU())))(conv_norm_out): GroupNorm(32, 512, eps=1e-06, affine=True)(conv_act): SiLU()(conv_out): Conv2d(512, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(decoder): Decoder((conv_in): Conv2d(4, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(up_blocks): ModuleList((0-1): 2 x UpDecoderBlock2D((resnets): ModuleList((0-2): 3 x ResnetBlock2D((norm1): GroupNorm(32, 512, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 512, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(upsamplers): ModuleList((0): Upsample2D((conv): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))))(2): UpDecoderBlock2D((resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 512, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 256, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(512, 256, kernel_size=(1, 1), stride=(1, 1)))(1-2): 2 x ResnetBlock2D((norm1): GroupNorm(32, 256, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 256, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(upsamplers): ModuleList((0): Upsample2D((conv): LoRACompatibleConv(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))))(3): UpDecoderBlock2D((resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 256, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 128, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(256, 128, kernel_size=(1, 1), stride=(1, 1)))(1-2): 2 x ResnetBlock2D((norm1): GroupNorm(32, 128, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 128, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))))(mid_block): UNetMidBlock2D((attentions): ModuleList((0): Attention((group_norm): GroupNorm(32, 512, eps=1e-06, affine=True)(to_q): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(to_k): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(to_v): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=512, out_features=512, bias=True)(1): Dropout(p=0.0, inplace=False))))(resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 512, eps=1e-06, affine=True)(conv1): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(norm2): GroupNorm(32, 512, eps=1e-06, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU())))(conv_norm_out): GroupNorm(32, 128, eps=1e-06, affine=True)(conv_act): SiLU()(conv_out): Conv2d(128, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))(quant_conv): Conv2d(8, 8, kernel_size=(1, 1), stride=(1, 1))(post_quant_conv): Conv2d(4, 4, kernel_size=(1, 1), stride=(1, 1))
)

“unet block”

UNet2DConditionModel((conv_in): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_proj): Timesteps()(time_embedding): TimestepEmbedding((linear_1): LoRACompatibleLinear(in_features=320, out_features=1280, bias=True)(act): SiLU()(linear_2): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True))(down_blocks): ModuleList((0): CrossAttnDownBlock2D((attentions): ModuleList((0-1): 2 x Transformer2DModel((norm): GroupNorm(32, 320, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(320, 320, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_k): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_v): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=320, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=320, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=320, out_features=2560, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=1280, out_features=320, bias=True)))))(proj_out): LoRACompatibleConv(320, 320, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 320, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=320, bias=True)(norm2): GroupNorm(32, 320, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(downsamplers): ModuleList((0): Downsample2D((conv): LoRACompatibleConv(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)))))(1): CrossAttnDownBlock2D((attentions): ModuleList((0-1): 2 x Transformer2DModel((norm): GroupNorm(32, 640, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(640, 640, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_k): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_v): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=640, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=640, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=640, out_features=5120, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=2560, out_features=640, bias=True)))))(proj_out): LoRACompatibleConv(640, 640, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 320, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=640, bias=True)(norm2): GroupNorm(32, 640, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(320, 640, kernel_size=(1, 1), stride=(1, 1)))(1): ResnetBlock2D((norm1): GroupNorm(32, 640, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=640, bias=True)(norm2): GroupNorm(32, 640, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(downsamplers): ModuleList((0): Downsample2D((conv): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)))))(2): CrossAttnDownBlock2D((attentions): ModuleList((0-1): 2 x Transformer2DModel((norm): GroupNorm(32, 1280, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(1280, 1280, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_k): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_v): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=1280, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=1280, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=1280, out_features=10240, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=5120, out_features=1280, bias=True)))))(proj_out): LoRACompatibleConv(1280, 1280, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 640, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(640, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(640, 1280, kernel_size=(1, 1), stride=(1, 1)))(1): ResnetBlock2D((norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))(downsamplers): ModuleList((0): Downsample2D((conv): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)))))(3): DownBlock2D((resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()))))(up_blocks): ModuleList((0): UpBlock2D((resnets): ModuleList((0-2): 3 x ResnetBlock2D((norm1): GroupNorm(32, 2560, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(2560, 1280, kernel_size=(1, 1), stride=(1, 1))))(upsamplers): ModuleList((0): Upsample2D((conv): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))))(1): CrossAttnUpBlock2D((attentions): ModuleList((0-2): 3 x Transformer2DModel((norm): GroupNorm(32, 1280, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(1280, 1280, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_k): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_v): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=1280, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=1280, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=1280, out_features=10240, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=5120, out_features=1280, bias=True)))))(proj_out): LoRACompatibleConv(1280, 1280, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 2560, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(2560, 1280, kernel_size=(1, 1), stride=(1, 1)))(2): ResnetBlock2D((norm1): GroupNorm(32, 1920, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(1920, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(1920, 1280, kernel_size=(1, 1), stride=(1, 1))))(upsamplers): ModuleList((0): Upsample2D((conv): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))))(2): CrossAttnUpBlock2D((attentions): ModuleList((0-2): 3 x Transformer2DModel((norm): GroupNorm(32, 640, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(640, 640, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_k): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_v): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=640, out_features=640, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=640, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=640, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=640, out_features=640, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=640, out_features=5120, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=2560, out_features=640, bias=True)))))(proj_out): LoRACompatibleConv(640, 640, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 1920, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(1920, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=640, bias=True)(norm2): GroupNorm(32, 640, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(1920, 640, kernel_size=(1, 1), stride=(1, 1)))(1): ResnetBlock2D((norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(1280, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=640, bias=True)(norm2): GroupNorm(32, 640, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(1280, 640, kernel_size=(1, 1), stride=(1, 1)))(2): ResnetBlock2D((norm1): GroupNorm(32, 960, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(960, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=640, bias=True)(norm2): GroupNorm(32, 640, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(960, 640, kernel_size=(1, 1), stride=(1, 1))))(upsamplers): ModuleList((0): Upsample2D((conv): LoRACompatibleConv(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))))(3): CrossAttnUpBlock2D((attentions): ModuleList((0-2): 3 x Transformer2DModel((norm): GroupNorm(32, 320, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(320, 320, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_k): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_v): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=320, out_features=320, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=320, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=320, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=320, out_features=320, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=320, out_features=2560, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=1280, out_features=320, bias=True)))))(proj_out): LoRACompatibleConv(320, 320, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0): ResnetBlock2D((norm1): GroupNorm(32, 960, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(960, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=320, bias=True)(norm2): GroupNorm(32, 320, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(960, 320, kernel_size=(1, 1), stride=(1, 1)))(1-2): 2 x ResnetBlock2D((norm1): GroupNorm(32, 640, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(640, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=320, bias=True)(norm2): GroupNorm(32, 320, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU()(conv_shortcut): LoRACompatibleConv(640, 320, kernel_size=(1, 1), stride=(1, 1))))))(mid_block): UNetMidBlock2DCrossAttn((attentions): ModuleList((0): Transformer2DModel((norm): GroupNorm(32, 1280, eps=1e-06, affine=True)(proj_in): LoRACompatibleConv(1280, 1280, kernel_size=(1, 1), stride=(1, 1))(transformer_blocks): ModuleList((0): BasicTransformerBlock((norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(attn1): Attention((to_q): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_k): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_v): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(attn2): Attention((to_q): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=False)(to_k): LoRACompatibleLinear(in_features=768, out_features=1280, bias=False)(to_v): LoRACompatibleLinear(in_features=768, out_features=1280, bias=False)(to_out): ModuleList((0): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(1): Dropout(p=0.0, inplace=False)))(norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)(ff): FeedForward((net): ModuleList((0): GEGLU((proj): LoRACompatibleLinear(in_features=1280, out_features=10240, bias=True))(1): Dropout(p=0.0, inplace=False)(2): LoRACompatibleLinear(in_features=5120, out_features=1280, bias=True)))))(proj_out): LoRACompatibleConv(1280, 1280, kernel_size=(1, 1), stride=(1, 1))))(resnets): ModuleList((0-1): 2 x ResnetBlock2D((norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)(conv1): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(time_emb_proj): LoRACompatibleLinear(in_features=1280, out_features=1280, bias=True)(norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)(dropout): Dropout(p=0.0, inplace=False)(conv2): LoRACompatibleConv(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(nonlinearity): SiLU())))(conv_norm_out): GroupNorm(32, 320, eps=1e-05, affine=True)(conv_act): SiLU()(conv_out): Conv2d(320, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)

“feature extractor block”

CLIPImageProcessor {"crop_size": {"height": 224,"width": 224},"do_center_crop": true,"do_convert_rgb": true,"do_normalize": true,"do_rescale": true,"do_resize": true,"feature_extractor_type": "CLIPFeatureExtractor","image_mean": [0.48145466,0.4578275,0.40821073],"image_processor_type": "CLIPImageProcessor","image_std": [0.26862954,0.26130258,0.27577711],"resample": 3,"rescale_factor": 0.00392156862745098,"size": {"shortest_edge": 224},"use_square_size": false
}

“tokenizer block”

CLIPTokenizer(name_or_path='/home/tiger/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9/tokenizer', vocab_size=49408, model_max_length=77, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={49406: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),49407: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}

“safety_checker block”

StableDiffusionSafetyChecker((vision_model): CLIPVisionModel((vision_model): CLIPVisionTransformer((embeddings): CLIPVisionEmbeddings((patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)(position_embedding): Embedding(257, 1024))(pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(encoder): CLIPEncoder((layers): ModuleList((0-23): 24 x CLIPEncoderLayer((self_attn): CLIPAttention((k_proj): Linear(in_features=1024, out_features=1024, bias=True)(v_proj): Linear(in_features=1024, out_features=1024, bias=True)(q_proj): Linear(in_features=1024, out_features=1024, bias=True)(out_proj): Linear(in_features=1024, out_features=1024, bias=True))(layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)(mlp): CLIPMLP((activation_fn): QuickGELUActivation()(fc1): Linear(in_features=1024, out_features=4096, bias=True)(fc2): Linear(in_features=4096, out_features=1024, bias=True))(layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True))))(post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)))(visual_projection): Linear(in_features=1024, out_features=768, bias=False)
)

“scheduler block”

PNDMScheduler {"_class_name": "PNDMScheduler","_diffusers_version": "0.22.3","beta_end": 0.012,"beta_schedule": "scaled_linear","beta_start": 0.00085,"clip_sample": false,"num_train_timesteps": 1000,"prediction_type": "epsilon","set_alpha_to_one": false,"skip_prk_steps": true,"steps_offset": 1,"timestep_spacing": "leading","trained_betas": null
}
http://www.lryc.cn/news/226832.html

相关文章:

  • 智慧城市照明为城市节能降耗提供支持继电器开关钡铼S270
  • 固高GTS800控制卡开发数控系统宏程序心得
  • linux入门---线程池的模拟实现
  • jQuery HTML/CSS 参考文档
  • QT 布局管理综合实例
  • 使用 pubsub-js 进行消息发布订阅
  • TA Shader基础
  • VScode + opencv(cmake编译) + c++ + win配置教程
  • Vue中的常用指令v-html / v-show / v-if / v-else / v-on / v-bind / v-for / v-model
  • ChatGPT 提问技巧
  • 2023-11-09 LeetCode每日一题(逃离火灾)
  • 阿里云-maven私服idea访问私服与组件上传
  • Ubuntu上的TFTP服务软件
  • jedis、lettuce与redis交互分析
  • C++算法:矩阵中的最长递增路径
  • OpenWRT配置SFTP远程文件传输,让数据分享更安全
  • 已解决:rm: 无法删除“/opt/module/zookeeper-3.4.10/zkData/zookeeper_server.pid“: 权限不够
  • Flink(四)【DataStream API - Source算子】
  • GIS入门,xyz地图瓦片是什么,xyz数据格式详解,如何发布离线XYZ瓦片到nginx或者tomcat中
  • [工业自动化-14]:西门子S7-15xxx编程 - 软件编程 - STEP7 TIA博途是全集成自动化软件TIA portal快速入门
  • 【教3妹学编程-算法题】Range 模块
  • SpringBoot+MybatisPlus Restful示例
  • 【数据结构】树与二叉树(十一):二叉树的层次遍历(算法LevelOrder)
  • 【PyQt】(自制类)处理鼠标点击逻辑
  • JAVA IDEA 下载
  • DevOps简介
  • 体验前所未有的显示器管理体验:BetterDisplay Pro Mac
  • python用pyinstaller打包exe,去掉黑窗口
  • 如何关闭Windows Defender(亲测可行!!非常简单)
  • 【objectarx.net】创建多重引线