Error for loading checkpoints sometimes

I fine tuned the T5 model and want to load the checkpoint. I have done this successfully for a lot of times but I got an error for one model:

INFO:transformers.tokenization_utils_base:loading file https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model from cache at /home/t-miahu/.cache/torch/transformers/68f1b8dbca4350743bb54b8c4169fd38cbabaad564f85a9239337a8d0342af9f.9995af32582a1a7062cb3173c118cb7b4636fa03feb967340f20fc37406f021f
Traceback (most recent call last):
File “eval_checkpoint_WEnoLem_PI_fromPI.py”, line 39, in
model = T5FineTuner.load_from_checkpoint(PATH)
File “/home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/pytorch_lightning/core/lightning.py”, line 1514, in load_from_checkpoint
checkpoint = torch.load(checkpoint_path, map_location=lambda storage, loc: storage)
File “/home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/serialization.py”, line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File “/home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/serialization.py”, line 224, in init
super(_open_zipfile_reader, self).init(torch.C.PyTorchFileReader(name_or_buffer))
RuntimeError: version
<= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f47cc697193 in /home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f472cc2b9eb in /home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f472cc2cc04 in /home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: + 0x6c53a6 (0x7f47cd5623a6 in /home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x2961c4 (0x7f47cd1331c4 in /home/t-miahu/anaconda3/envs/T5env/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #39: __libc_start_main + 0xe7 (0x7f47d1d25b97 in /lib/x86_64-linux-gnu/libc.so.6)

It seems the version has some problem, but why I did not have this error before? Any idea why I have no problem with other checkpoints but have error for this one?

I have the same issue… Did you solve it??

Hi @maroo93 could you post your detailed issue ?