What to do when HuggingFace throws "Can't load tokenizer"

Whether upon trying the inference API or running the code in “use with transformers” I get the following long error:

“Can’t load tokenizer using from_pretrained, please update its configuration: Can’t load tokenizer for ‘remi/bertabs-finetuned-extractive-abstractive-summarization’. If you were trying to load it from ‘Models - Hugging Face’, make sure you don’t have a local directory with the same name. Otherwise, make sure ‘remi/bertabs-finetuned-extractive-abstractive-summarization’ is the correct path to a directory containing all relevant files for a BertTokenizerFast tokenizer.”

This doesn’t just apply to this specific model but for many models I have tried to run. Is there any solution for this?

1 Like

I have this same error. replying to get feedback.

Hi, can you try:

from transformers import BertForMaskedLM

model = BertForMaskedLM.from_pretrained("remi/bertabs-finetuned-extractive-abstractive-summarization")

This worked for me :slight_smile:

1 Like

@EssamWisam How were you able to solve this problem. I am facing the same issue. The code is working in one of the GPUs but when i try to run it azure gpu i am getting this issue

Network issues can also cause this issue.
In my case I solved it by setting proxy, maybe you can try to check your network.

1 Like

Thanks so much!

1 Like

"I encountered a similar error but was able to resolve it by referring to the Hugging Face documentation.

  1. Initially, access the Hugging Face hub via the notebook by executing the following commands:

!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()

Note: Two types of tokens, namely ‘read’ and ‘write’, are generated in your huggingface hub. The ‘write’ token should be utilized for authorization.

  1. Begin by pushing the files associated with the model to the hub:

model.push_to_hub(“Model_Name”)

  1. Similarly, push the tokenizer-related files to the hub:

tokenizer.push_to_hub(“Model_Name”)

And that’s it, your problem is solved :hugs:

3 Likes

I got into the reported error above while following the guide Causal language modeling, I tried ONLY step 3 in my notebook and it worked - I can load the model and do inference on it, on hub and local as well, that means the tokenizer was missing, trainer.push_to_hub() did not generate/push it to hub. Thank you!

You may encounter this error if the model is a gated model, and you need to accept the licensing terms before using the model.

Go to the Model Card in hugging face, accept the licensing term, generate key and add the key in your program

import os
os.environ['HF_TOKEN']='my-key'

this worked for me.

1 Like