I want to test out a way to get videos from text, for that I need UNet3D but with pretrained weights of a T2I model. Is it possible to load CompVis’s stable-diffusion weights into the UNet3DConditional ?
I want to test out a way to get videos from text, for that I need UNet3D but with pretrained weights of a T2I model. Is it possible to load CompVis’s stable-diffusion weights into the UNet3DConditional ?