Cuda OOM on 4 A6000s (142 GB of VRAM) even after using Zero3, Qlora, Accelerate, Max_token_length

It seems that using zero2 instead of zero3 may work in some cases.