While working on an example of using AutoencoderKL and AutoencoderTiny (TAESD), I stumbled over the use of AutoencoderKL.scaling_factor.
It’s some factor that is necessary for using the VAE with existing Stable Diffusion models, but is not applied by any of the class’s methods, nor by the VAE Image Processor class?
Diffusers applies vae.config.scaling_factormanually within the image generation pipelines, rather than within the SD VAE itself. If you are working with the Diffusers SD VAE module you’ll need to do the same thing. Examples:
Rescaling decoded outputs from [-1, 1] → [0, 1] and clamping
Converting tensor format from NCHW torch GPU to NHWC numpy CPU.
The VaeImageProcessor only does steps 3 and 4 of that, so it’s a partial replacement.
Is there ever a time when one would call vae.encode or vae.decodewithout applying this scaling factor?
I’m not aware of any (this is why TAESD does not have a scaling_factor). I suspect the separate scaling_factor is inherited from Stability / CompVis code where they treat the scaling_factor as a parameter of the diffusion model (i.e. “how much did we scale the VAE latents down when training this particular diffusion model?”) rather than as a property of the VAE.
Right. Given that (a) misleading deprecation warnings are bad, and (b) methods that never return the correct values are bad, I request that we either
Complete the migration of the VAE scaling_factor away from the Pipeline class and in to the Autoencoder class. Always apply it to the values leaving encode and going in to decode. Or,
Provide methods wrapping the autoencoder that do the necessary thing. This could mean keeping decode_latents (and a corresponding encode) method on the Pipeline, or it could mean adding that scaling to the VaeImageProcessor’s preprocessing and postprocessing methods.
The “create new encode / decode methods that also scale / unscale, and use those wherever possible” approach (also mentioned - but not implemented) sounds like a good idea to me. It’s plausible that the diffusers team would accept a PR doing just that. But it’s not my call (I’m just some random dev )
Thank you Ollin! I’ve been hoping to draw out the maintainers who do make the API decisions, but I must have posted during summer vacation. Having the links to the previous discussions is handy though.