No benefit from turning on gradient_checkpointing: True

vivekkaul · May 17, 2024, 10:41pm

I was LORA finetuning LLama 70b model and I turned on gradient_checkpointing: True in my training config but it has no affect on the memory consumption at all and I don’t see any affect of the flag where I put it False or true. Any idea why that will the case ?

NinjaNegate · October 24, 2024, 4:28pm

Isn’t it because LORA has trainable parameters in few tens of millions order and is not significant memory wise?

Topic		Replies	Views
Memory consumption qlora with gradient checkpointing 🤗Transformers	0	447	January 28, 2024
Accuracy drops using Gradient checkpointing 🤗Transformers	0	163	September 7, 2023
GPU memory usage of optimizer's states when using LoRA DeepSpeed	4	842	July 5, 2024
Gradient Checkpointing with FSDP efficiency 🤗Transformers	0	582	August 20, 2023
Using gradient_checkpointing=True in Trainer causes error with LLaMA 🤗Transformers	1	2573	July 8, 2023

No benefit from turning on gradient_checkpointing: True

Related topics