Hi,
I am trying to run mixed int8 training along with DeepSpeed ZeRO3 across GPUs and run into a problem which I hope someone could help clarify.
It seems like mixed int8 requires specifying device_map
during model loading (from_pretrained()
) which will set low_cpu_mem_usage=True
automatically. However, ZeRO3 requires low_cpu_mem_usage=False
. What would be a reasonable solution here?
Appreciate any pointers in advance!