Program not working on GPU but works on CPU

Or maybe we should write some code to make better use of the GPU while keeping it as float32…
For example, quantizing or placing only VAE on the GPU…
Well, if speed becomes an issue, we’ll just try some trial and error.