How to get intermeidate output images

dkackman · January 7, 2023, 11:49pm

Is it possible to get the images at each denoising step via the Diffusers library? I am sure I’ve seen it done but can’t find where or how.

pcuenq · January 8, 2023, 11:34am

Hi @dkackman!

You might want to look at the callback mechanism, which sends intermediate latents to a function you specify. You could then decode the latents in that function and visualize them as you need.

This notebook includes a section about callbacks that demonstrates how to use that feature.

Good luck!

dkackman · January 8, 2023, 5:53pm

Oh perfect. I was unclear on how to transform the latents into an image but this exactly what iI was looking for.

vae = pipe.vae
images = []

def latents_callback(i, t, latents):
    latents = 1 / 0.18215 * latents
    image = vae.decode(latents).sample[0]
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.cpu().permute(1, 2, 0).numpy()
    images.extend(pipe.numpy_to_pil(image))

prompt = "Portrait painting of Jeremy Howard looking happy."
torch.manual_seed(9000)
final_image = pipe(prompt, callback=latents_callback, callback_steps=12).images[0]
images.append(final_image)
image_grid(images, rows=1, cols=len(images))

venkatesh-thiru · March 17, 2025, 5:55pm

Whats with the scaling in latents = 1 / 0.18215 * latents? is it a constant for every VAE? can I still apply the same callback for SD3.5?

John6666 · March 18, 2025, 6:02am

I think the same method can be used for the Diffusers pipeline.

Pipeline callbacks

Explanation of the 0.18215 factor in textual_inversion?

github.com/huggingface/diffusers

Explanation of the 0.18215 factor in textual_inversion?

opened 01:21AM - 09 Sep 22 UTC

closed 01:07PM - 09 Sep 22 UTC

garrett361

https://github.com/huggingface/diffusers/blob/b2b3b1a8ab83b020ecaf32f45de3ef2364…4331cf/examples/textual_inversion/textual_inversion.py#L501 Hi, just a small question about the quoted script above which is bothering me: where does this `0.18215` number come from? What computation is being done? Is it from some paper? I have seen the same factor elsewhere, too, without explanation. Any guidance would be very helpful, thanks!

Topic		Replies	Views
GenAI Model/system every iteration visible Beginners	3	33	January 13, 2025
Inverting images/encoding images into noise? 🧨 Diffusers	0	463	September 6, 2022
Got same resultant image while trying to re-use latents from a previous generation 🧨 Diffusers	1	271	May 22, 2023
Diffusers numpy handle Beginners	0	378	November 7, 2023
Diffusers documentation has some error code,i have fixed it 🧨 Diffusers	1	420	May 26, 2023

How to get intermeidate output images

Pipeline callbacks

Explanation of the 0.18215 factor in textual_inversion?

Related topics