Gradient_checkpointing = True results in error

Hi all,

I’m trying to finetune a summarization model (bigbird-pegasus-large-bigpatent) on my own data.
Of course even with premium colab I’m having memory issues, so I tried to set gradient_checkpointing = True in the Seq2SeqTrainingArguments, which is supposed to save some memory altgough increasing the computation time.

The problem is that when starting the training this argument rises an error:

AttributeError: module ‘torch.utils’ has no attribute ‘checkpoint’

Has anyone experienced this same error?
I read in the Github discussion:

that in some other cases the same error was appearing but it was supposed to be solved here:

Any help would be appreciated.
Thanks