Stable Diffusion FP16 on multi-GPU setups?


With most HuggingFace models one can spread the model across multiple GPUs to boost available VRAM by using HF Accelerate and passing the model kwarg device_map=“auto”

However, when you do that for the StableDiffusion model you get errors about ops being unimplemented on CPU for half(). Is there a way around this without switching to FP32 (e.g., device_map to everything except CPU or dynamically swap model parts from RAM to VRAM as needed?)