Error with new tokenizers (URGENT!)

Hi, recently all my pre-trained models undergo this error while loading their tokenizer:

Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I tried to pip install sentencepiece but this does not solve the problem. Do you know any solution? (I am working on Google Colab)

Note: In my humble opinion, changing so important things so fast can generate very dangerous problems. All my students (I teach DL stuff) and clients are stuck on my notebooks. I can understand that after a year a code can become outdated, but not just after two months. This requires a lot of maintenance work from my side!

There were some breaking changes in the V4 release, please find the details here:

1 Like

Thank you so much @FL33TW00D. Two steps were required.

  1. latest pip version 20.3.3 (On Colab I had installed by default 19 and something);
  2. use_fast = False.

Working a lot on Italian or personal custom models, all of them were disabled all of a sudden!

Hi @denocris I am facing the same problem. I am a student and I depend on this for my presentation tomorrow. I am also using Google Colab. Can you please explain in steps how to fix this sentencpiece problem? Thanks in advance

1 Like

I resolved it.

  1. Uninstalled transformers
  2. Installed transformers sentencepiece like this : !pip install --no-cache-dir transformers sentencepiece
  3. Use_fast= False like this: tokenizer = AutoTokenizer.from_pretrained(“XXXXX”, use_fast=False)
1 Like

Sorry @Ogayo, I have just read. I am happy you solved it. It is quite annoying to have these kind of issues from one day to another. I had this error during a live presentation. A couple of days before the notebook was working well!