OSError: Unable to load weights from pytorch checkpoint file

Hi, everyone. I need some help. I have been developing the Flask website that has embedded one of Transformer’s fine-tuned models within it. I fine-tuned the model with PyTorch. I’ve tested the web on my local machine and it worked at all.

I used fine-tuned model that I’ve already saved the weight to use locally, as pictured in the figure below:

The saved results contain:

  • config.json
  • pytorch_model.bin
  • special_tokens_map.json
  • tokenizer_config.json
  • vocab.txt

Then, I tried to deploy it to the cloud instance that I have reserved. Everything worked well until the model loading step and it said:
OSError: Unable to load weights from PyTorch checkpoint file at <my model path/pytorch_model.bin>. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I’ve searched around the internet to solve it but still nil. Can I get some enlightenment?

By the way, I’m using Ubuntu 18.04 instance and the environments that I’m used are:

  • torch 1.7.0
  • transformers 3.5.1

Thank you before!

Hi @aswincandra were you able to load the tokenizer in the Flask app without problems? My first guess is that the path you are pointing to in the app is not correct.

Thank you for your response, @lewtun. I’ve tried that way too, commented the model loading’s line of code and just loaded the tokenizer. Then… It worked :sweat_smile: So, I think there is no issue with the path. :thinking:

Interesting, then one possible way to debug the problem would be to try loading thestate_dict in native PyTorch and then seeing what the error is, e.g.

import torch

state_dict = torch.load(path_to_pytorch_bin_file, map_location="cpu")

This seems to be the step that is raising your OS error and could be a starting point

Thank you so much, Lewis! I’ll try it in the next few hours. But may I know first what does the torch.load() function return?

Is that a same thing as my previously used? Previously, I used this function:
model = BertForSequenceClassification.from_pretrained(<my_model_path>)

Hi @aswincandra the state_dict is just a Python dict that maps each layer to its corresponding tensors: What is a state_dict in PyTorch — PyTorch Tutorials 1.7.1 documentation

The reason I mentioned it is because I think your error is coming from this line of the from_pretrained function: transformers/modeling_utils.py at 748006c0b35d64cdee23a3cdc2107a1ce64044b5 · huggingface/transformers · GitHub

Right now you can’t see the lower-level error message from PyTorch, so trying to load it directly might shed some light on what the problem is :slight_smile:

Thank you for the insights @lewtun . The torch.load() function loaded all of the parameters in each layer properly and I don’t know why I think the model also can be loaded too now :sweat_smile:. Perhaps because I had time to change the library versions and then brought them back to the version that I’ve used initially. Now, I’m having another issue that doesn’t relate to the framework anymore. Thank you once again, Lewis!

1 Like

@lewtun hi, im having the same problem. I tried the torch.load() and it give this error RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

im trying to load model that i saved using transformers.Trainer.save_model.

hey @imtrying, if you saved the model using the Trainer you should be able to use the from_pretrained function to load the model as follows:

# pick the appropriate Auto class for your task
from transformers import AutoModel

model = AutoModel.from_pretrained("path/to/folder/where/you/saved/your/model")

if that doesn’t work, perhaps you can share which version of transformers you are using and how you created the Trainer?

hi its solved. it turns out that the pytorch_model.bin is somehow corrupted. Maybe because i saved the model in GCP AI platform, download it directly and upload it back to google colab

good to hear it’s solved!