ZeRO3 with int8 training


I am trying to run mixed int8 training along with DeepSpeed ZeRO3 across GPUs and run into a problem which I hope someone could help clarify.

It seems like mixed int8 requires specifying device_map during model loading (from_pretrained()) which will set low_cpu_mem_usage=True automatically. However, ZeRO3 requires low_cpu_mem_usage=False. What would be a reasonable solution here?

Appreciate any pointers in advance!