Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO

asherisaac · March 19, 2024, 7:50am

Deepspeed Zero-3 doesn’t work with 4-bit quantization yet. They recently announced support for quantization down to 8-bit, for which you have to make changes to your deepspeed json config file adding a new ‘quantization’ section.

Topic		Replies	Views
Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models DeepSpeed	2	3860	July 27, 2023
Deepspeed ZeRO2, PEFT, bitsnbytes training DeepSpeed	0	129	June 4, 2024
Finetuning 4bit model Beginners	1	2431	August 29, 2023
Deepspeed inference stage 3 + quantization DeepSpeed	0	1013	March 8, 2024
CUDA OOM with deepspeed - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 47.40 GiB of which 209.12 MiB is free Beginners	0	188	December 14, 2024

Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO

Related topics