I’m working on building a text2vid model from scratch in pytorch and using diffusers as a source to read about the VAE architecture
I’m working on building a text2vid model from scratch in pytorch and using diffusers as a source to read about the VAE architecture