Error with new tokenizers (URGENT!)

Hi, recently all my pre-trained models undergo this error while loading their tokenizer:

Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I tried to pip install sentencepiece but this does not solve the problem. Do you know any solution? (I am working on Google Colab)

Note: In my humble opinion, changing so important things so fast can generate very dangerous problems. All my students (I teach DL stuff) and clients are stuck on my notebooks. I can understand that after a year a code can become outdated, but not just after two months. This requires a lot of maintenance work from my side!

5 Likes

There were some breaking changes in the V4 release, please find the details here:

1 Like

Thank you so much @FL33TW00D. Two steps were required.

  1. latest pip version 20.3.3 (On Colab I had installed by default 19 and something);
  2. use_fast = False.

Working a lot on Italian or personal custom models, all of them were disabled all of a sudden!

3 Likes

Hi @denocris I am facing the same problem. I am a student and I depend on this for my presentation tomorrow. I am also using Google Colab. Can you please explain in steps how to fix this sentencpiece problem? Thanks in advance

1 Like

I resolved it.

  1. Uninstalled transformers
  2. Installed transformers sentencepiece like this : !pip install --no-cache-dir transformers sentencepiece
  3. Use_fast= False like this: tokenizer = AutoTokenizer.from_pretrained(ā€œXXXXXā€, use_fast=False)
24 Likes

Sorry @Ogayo, I have just read. I am happy you solved it. It is quite annoying to have these kind of issues from one day to another. I had this error during a live presentation. A couple of days before the notebook was working well!

Thanku @Ogayo

1 Like

Thank you @Ogayo !!

1 Like

THANKS A LOT @Ogayo

1 Like

Thank you very much @Ogayo.

1 Like

Hi @Ogayo ,im getting the same error after resolving the following steps also
what should i do ?

1 Like

Same problem here.
Solved it using:
pip install sentencepiece

I tried using use_fast=False but got the following error:


SyntaxError: keyword argument repeated: use_fast

I think itā€™s deprecated now! but the installation part works, so thank you.

1 Like

Hello Denocris Hope you are doing well
i am getting this error


KeyError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import AutoTokenizer, AutoModel
2
----> 3 tokenizer = AutoTokenizer.from_pretrained(ā€œmusadac/vilanocr-multi-medicalā€, use_fast=False)

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in getitem(self, key)
740 model_name = self._model_mapping[mtype]
741 return self._load_attr_from_module(mtype, model_name)
ā†’ 742 raise KeyError(key)
743
744 def _load_attr_from_module(self, model_type, attr):

KeyError: <class ā€˜transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder.VisionEncoderDecoderConfigā€™>

it works! thankx sooo much

Thank you very much for thisā€¦

Hi, Iā€™ve faced the same issue while integrating it into a calculator-based website. Installing transformers with sentencepiece and restarting the runtime worked for me. Try this:
ā€œpip install transformers sentencepieceā€