Memory consumption qlora with gradient checkpointing

cerisara · January 28, 2024, 5:14pm

Hi !
I’ve used qlora with gradient checkpointing on llama-2-7b and I’m surprised by the huge quantity of VRAM it’s taking when calling forward on a 2577 tokens: before the forward pass, it was using only 4.8GB, and during forward, crashed with OOM on a A100 80GB !

I used gradient checkpointing with:

model = AutoModelForCausalLM.from_pretrained(wd, load_in_4bit = True)
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing = True)
model.gradient_checkpointing_enable()
peft_config = LoraConfig(r = 8, task_type = TaskType.CAUSAL_LM)
model = get_peft_model(model, peft_config)

It shouldn’t use that much VRAM with gradient checkpointing, should it?

Thanks,
Christophe

Topic		Replies	Views
VRAM Usage Differences in SageMaker Training Jobs vs. Direct Instance for Fine-Tuning LLama3 8B with QLoRA Amazon SageMaker	0	61	October 18, 2024
No benefit from turning on gradient_checkpointing: True 🤗Transformers	1	161	October 24, 2024
OOM error with multi-GPU training of Llama2-70B using QLora 🤗Accelerate	2	2477	October 17, 2023
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1741	October 8, 2023
QLoRA memory requirement with 3B model loads GPU with 10GB of memory with 4bit quantization Intermediate	0	1150	December 19, 2023

Memory consumption qlora with gradient checkpointing

Related topics