Gradient_checkpointing = True results in error

ArnauC · October 13, 2021, 4:32pm

Hi all,

I’m trying to finetune a summarization model (bigbird-pegasus-large-bigpatent) on my own data.
Of course even with premium colab I’m having memory issues, so I tried to set gradient_checkpointing = True in the Seq2SeqTrainingArguments, which is supposed to save some memory altgough increasing the computation time.

The problem is that when starting the training this argument rises an error:

AttributeError: module ‘torch.utils’ has no attribute ‘checkpoint’

Has anyone experienced this same error?
I read in the Github discussion:
https://github.com/huggingface/transformers/issues/9617
https://github.com/huggingface/transformers/issues/11193
https://github.com/huggingface/transformers/issues/9919

that in some other cases the same error was appearing but it was supposed to be solved here:
https://github.com/huggingface/transformers/pull/9626

Any help would be appreciated.
Thanks

sfalk · December 16, 2021, 2:16pm

Hi! I am facing a similar issue. Have you been able to solve it?

The code that causing the problem here is the following:

model_path = "facebook/s2t-small-librispeech-asr"

# Initialize the model
model = Speech2TextForConditionalGeneration.from_pretrained(model_path)

model = model.eval()
# Attach decoder
model = SpeechRecognizer(model, labels=labels)

# Apply quantization / script / optimize for mobile
quantized_model = torch.quantization.quantize_dynamic(model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)
scripted_model = torch.jit.script(quantized_model)

The SpeechRecognizer is just a simple torch.nn.Module wrapper.

ArnauC · December 17, 2021, 11:13am

Hi! I think that instead of adding gradient_checkpointing as an argument to training arguments, I used this line when I defined the model:

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
model.gradient_checkpointing_enable()

Anyway we ended up training this model in a GCP, it was too big.

Hope this helps!

jaideepcs · February 22, 2023, 1:36pm

@ArnauC did it behave wierdly ?

Topic		Replies	Views
'BertEncoder' object has no attribute 'gradient_checkpointing' 🤗Transformers	2	7151	August 1, 2022
[Feature Request] Gradient Checkpointing for EncoderDecoderModel 🤗Transformers	3	1344	April 10, 2023
Using gradient_checkpointing=True in Trainer causes error with LLaMA 🤗Transformers	1	2478	July 8, 2023
Accuracy drops using Gradient checkpointing 🤗Transformers	0	151	September 7, 2023
Unable to prepare model for kbit training 🤗Transformers	2	2366	November 14, 2023

Gradient_checkpointing = True results in error

Related topics