Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO

Deepspeed Zero-3 doesn’t work with 4-bit quantization yet. They recently announced support for quantization down to 8-bit, for which you have to make changes to your deepspeed json config file adding a new ‘quantization’ section.