I am trying to run mixed int8 training along with DeepSpeed ZeRO3 across GPUs and run into a problem which I hope someone could help clarify.
It seems like mixed int8 requires specifying
device_map during model loading (
from_pretrained()) which will set
low_cpu_mem_usage=True automatically. However, ZeRO3 requires
low_cpu_mem_usage=False. What would be a reasonable solution here?
Appreciate any pointers in advance!