Error with new tokenizers (URGENT!)

denocris · December 16, 2020, 9:56am

Hi, recently all my pre-trained models undergo this error while loading their tokenizer:

Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I tried to pip install sentencepiece but this does not solve the problem. Do you know any solution? (I am working on Google Colab)

Note: In my humble opinion, changing so important things so fast can generate very dangerous problems. All my students (I teach DL stuff) and clients are stuck on my notebooks. I can understand that after a year a code can become outdated, but not just after two months. This requires a lot of maintenance work from my side!

FL33TW00D · December 16, 2020, 10:17am

There were some breaking changes in the V4 release, please find the details here:

denocris · December 16, 2020, 10:26am

Thank you so much @FL33TW00D. Two steps were required.

latest pip version 20.3.3 (On Colab I had installed by default 19 and something);
use_fast = False.

Working a lot on Italian or personal custom models, all of them were disabled all of a sudden!

Ogayo · December 16, 2020, 12:43pm

Hi @denocris I am facing the same problem. I am a student and I depend on this for my presentation tomorrow. I am also using Google Colab. Can you please explain in steps how to fix this sentencpiece problem? Thanks in advance

Ogayo · December 16, 2020, 1:03pm

I resolved it.

Uninstalled transformers
Installed transformers sentencepiece like this : !pip install --no-cache-dir transformers sentencepiece
Use_fast= False like this: tokenizer = AutoTokenizer.from_pretrained(“XXXXX”, use_fast=False)

denocris · December 16, 2020, 4:41pm

Sorry @Ogayo, I have just read. I am happy you solved it. It is quite annoying to have these kind of issues from one day to another. I had this error during a live presentation. A couple of days before the notebook was working well!

kroshan · April 5, 2022, 7:23pm

Thanku @Ogayo

evegarcianz · February 15, 2023, 8:25pm

Thank you @Ogayo !!

zeelthumar-04 · June 12, 2023, 10:27am

THANKS A LOT @Ogayo

ashu3984 · October 2, 2023, 6:14pm

Thank you very much @Ogayo.

VishNikhil · November 22, 2023, 1:15pm

Hi @Ogayo ,im getting the same error after resolving the following steps also
what should i do ?

borat123 · March 6, 2024, 12:00pm

Same problem here.
Solved it using:
pip install sentencepiece

Hope2000 · April 4, 2024, 7:42am

I tried using use_fast=False but got the following error:


SyntaxError: keyword argument repeated: use_fast

I think it’s deprecated now! but the installation part works, so thank you.

MahadA · May 15, 2024, 9:40am

Hello Denocris Hope you are doing well
i am getting this error

KeyError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import AutoTokenizer, AutoModel
2
----> 3 tokenizer = AutoTokenizer.from_pretrained(“musadac/vilanocr-multi-medical”, use_fast=False)

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in getitem(self, key)
740 model_name = self._model_mapping[mtype]
741 return self._load_attr_from_module(mtype, model_name)
→ 742 raise KeyError(key)
743
744 def _load_attr_from_module(self, model_type, attr):

KeyError: <class ‘transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfig’>

joyz29 · July 19, 2024, 9:48am

it works! thankx sooo much

Kaxder23 · July 20, 2024, 9:59am

Thank you very much for this…

Kaxder23 · July 22, 2024, 7:58am

Hi, I’ve faced the same issue while integrating it into a calculator-based website. Installing transformers with sentencepiece and restarting the runtime worked for me. Try this:
“pip install transformers sentencepiece”

Topic		Replies	Views
Couldn't instantiate the backend tokenizer 🤗Tokenizers	0	2298	December 7, 2020
Value error : sentencepiece Beginners	15	35370	July 23, 2025
Issue with sentencepiece tokenizer 🤗Transformers	2	2040	July 25, 2022
Cannot initialize deberta-v3-base tokenizer 🤗Tokenizers	2	1536	October 9, 2022
T0 Tokenizer Throws Error 🤗Transformers	1	746	November 1, 2021

Error with new tokenizers (URGENT!)

Related topics