How to combine LoRA and gradient_checkpointing in Whisper?

Borrison · August 17, 2023, 10:04pm

I ran into this issue earlier.
The cause of the issue was due to the missing grad_fn in the loss value.
As stated in the documentation of gradient checkpointing:

If use_reentrant=True is specified, at least one of the inputs needs to have requires_grad=True if grads are needed for model inputs, otherwise the checkpointed part of the model won’t have gradients. At least one of the outputs needs to have requires_grad=True as well. Note that this does not apply if use_reentrant=False is specified.

Thus, I fixed it by adding the flag use_reentrant=False in torch.utils.checkpoint.checkpoint() in the transformers/src/transformers/models/whisper/modeling_whisper.py file.

Topic		Replies	Views
Using gradient_checkpointing=True in Trainer causes error with LLaMA 🤗Transformers	1	2546	July 8, 2023
Accuracy drops using Gradient checkpointing 🤗Transformers	0	156	September 7, 2023
PEFT LoRA GPT-NeoX - Backward pass failing 🤗Transformers	7	7314	July 29, 2024
Gradient_checkpointing = True results in error 🤗Transformers	3	8729	February 22, 2023
[Feature Request] Gradient Checkpointing for EncoderDecoderModel 🤗Transformers	3	1348	April 10, 2023

How to combine LoRA and gradient_checkpointing in Whisper?

Related topics