Decoding latents to RGB without upscaling

keturn · September 17, 2022, 1:33am

After some empirical tests, I have determined that I can get a useful approximation of the RGB output using a linear combination of the latent channels.

sd-latent-channels-linear-approximation

This approximation comes from multiplying the four latent channels by these factors:

v1_4_rgb_latent_factors = [
    #   R       G       B
    [ 0.298,  0.207,  0.208],  # L1
    [ 0.187,  0.286,  0.173],  # L2
    [-0.158,  0.189,  0.264],  # L3
    [-0.184, -0.271, -0.473],  # L4
]

[This is for Stable Diffusion v1.4. I assume it’s not universally true.]

Here’s the output from the actual VAE decoder for comparison:

The approximation is a little undersaturated, maybe it could stand to have a bit of tuning, but it’s not bad considering it’s over two thousand times faster.

So that’s useful to know. But I certainly feel like I did this the hard way, by analyzing some outputs and developing a new approximation of them. Hopefully there’s an easier way to determine those values or some other similarly cheap approximation?

Topic		Replies	Views
[Stable Diffusion] Error in "In Painting" pipeline 🧨 Diffusers	5	1816	June 29, 2023
AutoencoderKL.scaling_factor and VaeImageProcessor 🧨 Diffusers	6	4061	August 29, 2023
VAE change shape of latent space 🧨 Diffusers	0	325	April 14, 2024
How to get intermeidate output images 🧨 Diffusers	4	3214	March 18, 2025
Set latents in StableDiffusionInpaintPipeline to original image 🧨 Diffusers	1	618	May 17, 2024

Decoding latents to RGB without upscaling

Related topics