I had the same problem while training a Roberta model. I tried to resume my training with the last checkpoint without success. When I tried to load the second to last checkpoint it worked fine, therefore, my last checkpoint was corrupted and the solution is to restore a previous checkpoint.