VAE change shape of latent space

beyarkay · April 14, 2024, 7:54pm

I’m trying to get some diffusion model to generate pixel art, but speed is important (I’m hoping to turn it into a real-time interactive game, so am aiming for ~30fps), so I can’t just generate a 512x512 image and then downscale it.

I’ve followed the blog blog/stable_diffusion which helped a lot to get started, but now I want to generate a smaller image, which is proving difficult to figure out because everyone else wants to generate bigger images.

When I try to decrease the size of the output image, the quality goes down because the shape of the latent space is chosen automatically based on the requested output image’s shape:

shape = (
    batch_size,
    unet.config.in_channels,
    height // 8,
    width // 8
)
# Seed generator to create the inital latent noise
generator = torch.manual_seed(0)
latents = (torch.randn(shape, generator=generator))

But if I change the shape so that my image is (for example) 64x64 but the latent space is also 64x64, then when I try to decode the latent space image the output image becomes 512x512 because (as far as I can tell) the VAE assumes the latent space is 8x smaller than the requested output image.

Is it possible to tell the VAE what size the latent space should be? I’m defining the VAE like so:

    vae = AutoencoderKL.from_pretrained(
        "CompVis/stable-diffusion-v1-4",
        subfolder="vae",
        variant="fp16"
    )

Topic		Replies	Views
Decoding latents to RGB without upscaling 🧨 Diffusers	12	11519	April 23, 2023
Reduce the size of the latent space in a VQModel Beginners	0	260	October 1, 2023
AutoencoderKL.scaling_factor and VaeImageProcessor 🧨 Diffusers	6	4061	August 29, 2023
Set latents in StableDiffusionInpaintPipeline to original image 🧨 Diffusers	1	618	May 17, 2024
[Stable Diffusion] Error in "In Painting" pipeline 🧨 Diffusers	5	1816	June 29, 2023

VAE change shape of latent space

Related topics