Cannot download translation models in Colab

ghosh-r · June 18, 2021, 9:34am

I am trying to translate English text to German. And so I run this-

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de")

But I get thrown an error-

ValueError: This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed in order to use this tokenizer.

Full error message

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-accbe9f8763e> in <module>()
----> 1 translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de")

1 frames
/usr/local/lib/python3.7/dist-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs)
    441 
    442             tokenizer = AutoTokenizer.from_pretrained(
--> 443                 tokenizer_identifier, revision=revision, use_fast=use_fast, _from_pipeline=task, **tokenizer_kwargs
    444             )
    445 

/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    449                 else:
    450                     raise ValueError(
--> 451                         "This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed "
    452                         "in order to use this tokenizer."
    453                     )

ValueError: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer.

As it is suggested that I should have sentencepiece installed, I installed it via pip, but that does not help. I have tried importing it so that its namespace is available, but it still does not work.

Note: Besides the Helsinki-NLP/opus-mt-en-de model, I have also tried using the Helsinki-NLP/opus-mt-fr-en model as shown in the course video, but it does not work either.

What am I missing?

johnnyfivefingers · June 18, 2021, 12:18pm

okay, I tried to run this locally (not in Colab):

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-de")

translation = translator("hello, my name is Bob")

print(translation)

and it printed out:

[{'translation_text': 'Hallo, mein Name ist Bob.'}]

I don’t know where you try run that code, but seems to work ok for me. Have you installed latest package of transformers?

pip install transformers -U

Where do you run that code?

Can you share the full code to see if something else is going on there?

BramVanroy · June 18, 2021, 12:49pm

Did you restart the kernel after installing sentencepiece?

sgugger · June 18, 2021, 1:21pm

Yes make sure to run the latest version of the notebooks (the first cell should be ! pip install datasets transformers[sentencepiece]).

ghosh-r · June 18, 2021, 2:38pm

This works. Thanks.

I had created a new empty Notebook and was working on that. It wasn’t clear to me that I should only use Notebooks that appear if I click the “Open in Colab” button on the course pages.

Thanks for clarifying.

Topic		Replies	Views
Error with new tokenizers (URGENT!) 🤗Tokenizers	16	51285	July 22, 2024
Value error : Connection error 🤗Transformers	8	16179	August 4, 2021
Value error : sentencepiece Beginners	15	35228	July 23, 2025
“OSError: Model name './XX' was not found in tokenizers model name list” - cannot load custom tokenizer in Transformers 🤗Tokenizers	14	6902	April 25, 2023
Could not load model Helsinki-NLP/opus-mt-fr-en Beginners	1	1208	February 13, 2024

Cannot download translation models in Colab

Related topics