How to change default tokenizer name in Use in Transformers

For each model, there’s a Use in Transformers button to show how a model and its corresponding tokenizer can be used/loaded in code. For example, for a model that I trained and deployed to the hub:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("phosseini/glucose-roberta-large")

model = AutoModelForMaskedLM.from_pretrained("phosseini/glucose-roberta-large")

The problem is that when the model is being loaded since there is no tokenizer named phosseini/glucose-roberta-large we will get a the following error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-4-8e9d5bcc29a4> in <module>()
      1 from transformers import AutoTokenizer, AutoModelForMaskedLM
      2 
----> 3 tokenizer = AutoTokenizer.from_pretrained("phosseini/glucose-roberta-large")
      4 
      5 model = AutoModelForMaskedLM.from_pretrained("phosseini/glucose-roberta-large")

1 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1763         if all(full_file_name is None for full_file_name in resolved_vocab_files.values()):
   1764             raise EnvironmentError(
-> 1765                 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
   1766                 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
   1767                 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "

OSError: Can't load tokenizer for 'phosseini/glucose-roberta-large'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'phosseini/glucose-roberta-large' is the correct path to a directory containing all relevant files for a RobertaTokenizerFast tokenizer.

I believe this is also causing the error on HuggingFace’s UI when we press the compute button to run the Hosted inference API.

It would be great if we can change the tokenizer’s name. For example, in this case, the model’s tokenizer is roberta-large. (Not sure if we can do it now or if I’m missing something?)

1 Like