AutoencoderKL.scaling_factor and VaeImageProcessor

While working on an example of using AutoencoderKL and AutoencoderTiny (TAESD), I stumbled over the use of AutoencoderKL.scaling_factor.

It’s some factor that is necessary for using the VAE with existing Stable Diffusion models, but is not applied by any of the class’s methods, nor by the VAE Image Processor class?

In the past, I’ve used the StableDiffusionPipeline#decode_latents method to do that, but that method now emits this warning:

The decode_latents method is deprecated and will be removed in a future version. Please use VaeImageProcessor instead.

How am I to replace my usage of this deprecated method?

Diffusers applies vae.config.scaling_factor manually within the image generation pipelines, rather than within the SD VAE itself. If you are working with the Diffusers SD VAE module you’ll need to do the same thing. Examples:

1 Like

So the deprecation message is a misdirect, and the class it references does not replace it?

In that case, why deprecate the decode_latents method at all?

Is there ever a time when one would call vae.encode or vae.decode without applying this scaling factor?

So the deprecation message is a misdirect, and the class it references does not replace it?

Kinda. decode_latents was doing several things:

  1. Applying the scaling_factor
  2. Running the VAE decoder
  3. Rescaling decoded outputs from [-1, 1] → [0, 1] and clamping
  4. Converting tensor format from NCHW torch GPU to NHWC numpy CPU.

The VaeImageProcessor only does steps 3 and 4 of that, so it’s a partial replacement.

Is there ever a time when one would call vae.encode or vae.decode without applying this scaling factor?

I’m not aware of any (this is why TAESD does not have a scaling_factor). I suspect the separate scaling_factor is inherited from Stability / CompVis code where they treat the scaling_factor as a parameter of the diffusion model (i.e. “how much did we scale the VAE latents down when training this particular diffusion model?”) rather than as a property of the VAE.

1 Like

Right. Given that (a) misleading deprecation warnings are bad, and (b) methods that never return the correct values are bad, I request that we either

  1. Complete the migration of the VAE scaling_factor away from the Pipeline class and in to the Autoencoder class. Always apply it to the values leaving encode and going in to decode. Or,
  2. Provide methods wrapping the autoencoder that do the necessary thing. This could mean keeping decode_latents (and a corresponding encode) method on the Pipeline, or it could mean adding that scaling to the VaeImageProcessor’s preprocessing and postprocessing methods.

What’ll it be?

It looks like the “do scaling in encode / decode” approach was already proposed in [[wip] init scale_value on vae by williamberman · Pull Request #1515 · huggingface/diffusers · GitHub] but rejected in favor of [make scaling factor a config arg of vae/vqvae by patil-suraj · Pull Request #1860 · huggingface/diffusers · GitHub] due to backwards compatibility concerns (since lots of external code was already applying the scaling factor outside the VAE).

The “create new encode / decode methods that also scale / unscale, and use those wherever possible” approach (also mentioned - but not implemented) sounds like a good idea to me. It’s plausible that the diffusers team would accept a PR doing just that. But it’s not my call (I’m just some random dev :slight_smile: )

Thank you Ollin! I’ve been hoping to draw out the maintainers who do make the API decisions, but I must have posted during summer vacation. Having the links to the previous discussions is handy though.