AutoencoderKL.scaling_factor and VaeImageProcessor

keturn · August 19, 2023, 9:34pm

While working on an example of using AutoencoderKL and AutoencoderTiny (TAESD), I stumbled over the use of AutoencoderKL.scaling_factor.

It’s some factor that is necessary for using the VAE with existing Stable Diffusion models, but is not applied by any of the class’s methods, nor by the VAE Image Processor class?

In the past, I’ve used the StableDiffusionPipeline#decode_latents method to do that, but that method now emits this warning:

The decode_latents method is deprecated and will be removed in a future version. Please use VaeImageProcessor instead.

How am I to replace my usage of this deprecated method?

madebyollin · August 20, 2023, 4:45pm

Diffusers applies vae.config.scaling_factor manually within the image generation pipelines, rather than within the SD VAE itself. If you are working with the Diffusers SD VAE module you’ll need to do the same thing. Examples:

github.com

huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py#L387


      
          # 3. Encode input prompt
          text_embeddings, text_pooler_out = self._encode_prompt(
              prompt, device, do_classifier_free_guidance, negative_prompt
          )
          
          # 4. Preprocess image
          image = self.image_processor.preprocess(image)
          image = image.to(dtype=text_embeddings.dtype, device=device)
          if image.shape[1] == 3:
              # encode image if not in latent-space yet
              image = self.vae.encode(image).latent_dist.sample() * self.vae.config.scaling_factor
          
          # 5. set timesteps
          self.scheduler.set_timesteps(num_inference_steps, device=device)
          timesteps = self.scheduler.timesteps
          
          batch_multiplier = 2 if do_classifier_free_guidance else 1
          image = image[None, :] if image.ndim == 3 else image
          image = torch.cat([image] * batch_multiplier)
          
          # 5. Add noise to image (set to be 0):

github.com

huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py#L485


      
                  # compute the previous noisy sample x_t -> x_t-1
                  latents = self.scheduler.step(noise_pred, t, latents).prev_sample
          
                  # call the callback, if provided
                  if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
                      progress_bar.update()
                      if callback is not None and i % callback_steps == 0:
                          callback(i, t, latents)
          
          if not output_type == "latent":
              image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
          else:
              image = latents
          
          image = self.image_processor.postprocess(image, output_type=output_type)
          
          if not return_dict:
              return (image,)
          
          return ImagePipelineOutput(images=image)

keturn · August 20, 2023, 5:20pm

So the deprecation message is a misdirect, and the class it references does not replace it?

In that case, why deprecate the decode_latents method at all?

Is there ever a time when one would call vae.encode or vae.decode without applying this scaling factor?

madebyollin · August 20, 2023, 5:37pm

So the deprecation message is a misdirect, and the class it references does not replace it?

Kinda. decode_latents was doing several things:

Applying the scaling_factor
Running the VAE decoder
Rescaling decoded outputs from [-1, 1] → [0, 1] and clamping
Converting tensor format from NCHW torch GPU to NHWC numpy CPU.

The VaeImageProcessor only does steps 3 and 4 of that, so it’s a partial replacement.

Is there ever a time when one would call vae.encode or vae.decode without applying this scaling factor?

I’m not aware of any (this is why TAESD does not have a scaling_factor). I suspect the separate scaling_factor is inherited from Stability / CompVis code where they treat the scaling_factor as a parameter of the diffusion model (i.e. “how much did we scale the VAE latents down when training this particular diffusion model?”) rather than as a property of the VAE.

keturn · August 23, 2023, 12:03am

Right. Given that (a) misleading deprecation warnings are bad, and (b) methods that never return the correct values are bad, I request that we either

Complete the migration of the VAE scaling_factor away from the Pipeline class and in to the Autoencoder class. Always apply it to the values leaving encode and going in to decode. Or,
Provide methods wrapping the autoencoder that do the necessary thing. This could mean keeping decode_latents (and a corresponding encode) method on the Pipeline, or it could mean adding that scaling to the VaeImageProcessor’s preprocessing and postprocessing methods.

What’ll it be?

madebyollin · August 28, 2023, 3:16pm

It looks like the “do scaling in encode / decode” approach was already proposed in [[wip] init scale_value on vae by williamberman · Pull Request #1515 · huggingface/diffusers · GitHub] but rejected in favor of [make scaling factor a config arg of vae/vqvae by patil-suraj · Pull Request #1860 · huggingface/diffusers · GitHub] due to backwards compatibility concerns (since lots of external code was already applying the scaling factor outside the VAE).

The “create new encode / decode methods that also scale / unscale, and use those wherever possible” approach (also mentioned - but not implemented) sounds like a good idea to me. It’s plausible that the diffusers team would accept a PR doing just that. But it’s not my call (I’m just some random dev )

keturn · August 29, 2023, 7:01pm

Thank you Ollin! I’ve been hoping to draw out the maintainers who do make the API decisions, but I must have posted during summer vacation. Having the links to the previous discussions is handy though.

Topic		Replies	Views
Decoding latents to RGB without upscaling 🧨 Diffusers	12	11549	April 23, 2023
Stable diffusion inpainting 1.5 uses KL autoencoder however paper reports best metric with VQ-VAE 🧨 Diffusers	1	1568	December 28, 2023
Pretraining AutoencoderKL? 🧨 Diffusers	0	429	January 25, 2023
[Stable Diffusion] Error in "In Painting" pipeline 🧨 Diffusers	5	1818	June 29, 2023
Debugging Custom Stable Diffusion Pipeline for 1D Signal Generation 🧨 Diffusers	1	14	July 2, 2025

AutoencoderKL.scaling_factor and VaeImageProcessor

Related topics