GPU memory usage of optimizer's states when using LoRA

YalunHu · July 2, 2024, 4:10am

Hi, recently I am using LoRA(by using peft) + transformers Trainer + DeepSpeed(ZeRO3) to finetune my model (around 7B params). Before this, I tried full-param finetune as well.

The wierd thing is that I found LoRA seems not help to reduce the GPU memory usage compared with the full-param finetune method.

My question is : when using LoRA, I checkecd that a lot of paramerters’s requires_grad is set to False, will this help to reduce the Adam optimizer’s GPU memory usage? Since I think setting requires_grad to False must help reduce the memory usage of gradients tesnor, but the GPU mem usage did not reduce a lot, so I was wondering if it is because the GPU-mem taken by the optimizer’s states is still as the same as before? Will the optimizer record the states of model weights which does not need the gradient?

terjenf · July 5, 2024, 12:56am

Hi,
Sorry for diverging from your question, but I don’t find so much info about this online. How do you merge the adapters resulting from the Lora+Zero3 finetuning back to the base model?

YalunHu · July 5, 2024, 1:12am

Solved! I finally find out that it is because I didn’t set the gradient_checkpointing=True during my LoRA training, which takes a lot of GPU memory!

YalunHu · July 5, 2024, 1:13am

Ah I just call the PeftModel’s .merge_and_unload method. It will change the base model’s weights in-place.

system · July 5, 2024, 1:14pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No benefit from turning on gradient_checkpointing: True 🤗Transformers	1	189	October 24, 2024
Setting requires_grad=False seems not saving GPU memory usage 🤗Transformers	0	338	January 18, 2024
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3897	September 18, 2023
How DeepSpeed interacts with Trainer optimizer DeepSpeed	1	1212	October 13, 2021
Trainer CUDA OOM error when saving optimizer 🤗Transformers	0	872	November 25, 2021

GPU memory usage of optimizer's states when using LoRA

Related topics