Run_mlm.py cuda error memory after resuming a training

I’m running into the same issue but with the mBART model. For some reason, running training from scratch with the Seq2SeqTrainer works just fine, but resuming from checkpoint exceeds the memory limit, and produces a CUDA ‘out of memory’ error.

I think it might be related to this issue on the GitHub repository.

@sshleifer I think this is another issue with training large models, as we discussed here although this just seems to be a bug in the trainer.