Iâm running into the same issue but with the mBART model. For some reason, running training from scratch with the Seq2SeqTrainer
works just fine, but resuming from checkpoint exceeds the memory limit, and produces a CUDA âout of memoryâ error.
I think it might be related to this issue on the GitHub repository.
@sshleifer I think this is another issue with training large models, as we discussed here although this just seems to be a bug in the trainer.