Gradient_checkpointing = True results in error

Hi all,

I’m trying to finetune a summarization model (bigbird-pegasus-large-bigpatent) on my own data.
Of course even with premium colab I’m having memory issues, so I tried to set gradient_checkpointing = True in the Seq2SeqTrainingArguments, which is supposed to save some memory altgough increasing the computation time.

The problem is that when starting the training this argument rises an error:

AttributeError: module ‘torch.utils’ has no attribute ‘checkpoint’

Has anyone experienced this same error?
I read in the Github discussion:

that in some other cases the same error was appearing but it was supposed to be solved here:

Any help would be appreciated.

Hi! I am facing a similar issue. Have you been able to solve it?

The code that causing the problem here is the following:

model_path = "facebook/s2t-small-librispeech-asr"

# Initialize the model
model = Speech2TextForConditionalGeneration.from_pretrained(model_path)

model = model.eval()
# Attach decoder
model = SpeechRecognizer(model, labels=labels)

# Apply quantization / script / optimize for mobile
quantized_model = torch.quantization.quantize_dynamic(model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)
scripted_model = torch.jit.script(quantized_model)

The SpeechRecognizer is just a simple torch.nn.Module wrapper.

Hi! I think that instead of adding gradient_checkpointing as an argument to training arguments, I used this line when I defined the model:

model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

Anyway we ended up training this model in a GCP, it was too big.

Hope this helps!