What does "0.18215" mean in blog Stable Diffusion with 🧨 Diffusers?

In the part Stable Diffusion with 🧨 Diffusers, there is this line.
image
What is “0.18215” and why i should do this?

And the code for generating without pipeline is not working for stable_1.5. It only generate normal picture after i accidentally add one line of code.

from tqdm.auto import tqdm
from torch import autocast

scheduler.set_timesteps(num_inference_steps) # i add this one
for t in tqdm(scheduler.timesteps):
  latent_model_input = torch.cat([latents] * 2)

  latent_model_input = scheduler.scale_model_input(latent_model_input, t)

  with torch.no_grad():
    noise_pred = unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample

  noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
  noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

  latents = scheduler.step(noise_pred, t, latents).prev_sample

Thank you!

This question came up on GitHub too:

Patrick’s suggested moving that bit to the VAE so you don’t need to specify it in the pipeline, but I don’t think anyone’s done a PR for that yet.

2 Likes

Thank you! I will take a look.

I’ve written some code to estimate that value, in case it helps: Explanation of the 0.18215 factor in textual_inversion? · Issue #437 · huggingface/diffusers · GitHub

1 Like