ZeRO3 with int8 training

Kenkentron · August 11, 2023, 5:42pm

Hi,

I am trying to run mixed int8 training along with DeepSpeed ZeRO3 across GPUs and run into a problem which I hope someone could help clarify.

It seems like mixed int8 requires specifying device_map during model loading (from_pretrained()) which will set low_cpu_mem_usage=True automatically. However, ZeRO3 requires low_cpu_mem_usage=False. What would be a reasonable solution here?

Appreciate any pointers in advance!

Topic		Replies	Views
Is Int8 quantization training possible while using deepspeed? DeepSpeed	0	585	December 1, 2021
Is CPU-offloading function in accelerate same with deepSpeed? 🤗Accelerate	4	2753	July 1, 2023
DeepSpeed Zero causes intermittent GPU usage 🤗Accelerate	1	310	December 19, 2024
Questions about deepspeed multi-node training with sharding parameters inside a single 8-gpu machine DeepSpeed	0	842	October 21, 2022
ZeRO uses more RAM than DDP? DeepSpeed	0	1018	August 7, 2023

ZeRO3 with int8 training

Related topics