What to do when HuggingFace throws "Can't load tokenizer"

EssamWisam · September 12, 2022, 10:31pm

Whether upon trying the inference API or running the code in “use with transformers” I get the following long error:

“Can’t load tokenizer using from_pretrained, please update its configuration: Can’t load tokenizer for ‘remi/bertabs-finetuned-extractive-abstractive-summarization’. If you were trying to load it from ‘Models - Hugging Face’, make sure you don’t have a local directory with the same name. Otherwise, make sure ‘remi/bertabs-finetuned-extractive-abstractive-summarization’ is the correct path to a directory containing all relevant files for a BertTokenizerFast tokenizer.”

This doesn’t just apply to this specific model but for many models I have tried to run. Is there any solution for this?

matthew-edgin · November 17, 2022, 2:24pm

I have this same error. replying to get feedback.

stevhliu · November 17, 2022, 3:18pm

Hi, can you try:

from transformers import BertForMaskedLM

model = BertForMaskedLM.from_pretrained("remi/bertabs-finetuned-extractive-abstractive-summarization")

This worked for me

kkedia · June 9, 2023, 9:05am

@EssamWisam How were you able to solve this problem. I am facing the same issue. The code is working in one of the GPUs but when i try to run it azure gpu i am getting this issue

double-fire · August 2, 2023, 11:59am

Network issues can also cause this issue.
In my case I solved it by setting proxy, maybe you can try to check your network.

wanghao001 · October 12, 2023, 11:17am

Thanks so much!

Areeb123 · November 16, 2023, 2:43pm

"I encountered a similar error but was able to resolve it by referring to the Hugging Face documentation.

Initially, access the Hugging Face hub via the notebook by executing the following commands:

!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()

Note: Two types of tokens, namely ‘read’ and ‘write’, are generated in your huggingface hub. The ‘write’ token should be utilized for authorization.

Begin by pushing the files associated with the model to the hub:

model.push_to_hub(“Model_Name”)

Similarly, push the tokenizer-related files to the hub:

tokenizer.push_to_hub(“Model_Name”)

And that’s it, your problem is solved

tunggad · December 22, 2023, 10:39pm

I got into the reported error above while following the guide Causal language modeling, I tried ONLY step 3 in my notebook and it worked - I can load the model and do inference on it, on hub and local as well, that means the tokenizer was missing, trainer.push_to_hub() did not generate/push it to hub. Thank you!

Meera-Datey · May 5, 2024, 7:33am

You may encounter this error if the model is a gated model, and you need to accept the licensing terms before using the model.

Go to the Model Card in hugging face, accept the licensing term, generate key and add the key in your program

import os
os.environ['HF_TOKEN']='my-key'

this worked for me.

Topic		Replies	Views
Tokenizer is not being loaded on Huggingface Inference 🤗Tokenizers	0	986	September 22, 2022
Huggingface inference API issue 🤗Tokenizers	0	508	January 10, 2023
Tokenizer issue in Huggingface Inference on uploaded models Beginners	7	3063	January 9, 2024
Model hub: Can't load tokenizer using from_pretrained 🤗Hub	2	1606	January 5, 2022
Cant load tokenizer using from_pretrained, `use_auth_token=True` error when token is being used Inference Endpoints on the Hub	7	7672	August 6, 2023

What to do when HuggingFace throws "Can't load tokenizer"

Related topics