How to change default tokenizer name in Use in Transformers

phosseini · May 4, 2022, 6:12pm

For each model, there’s a Use in Transformers button to show how a model and its corresponding tokenizer can be used/loaded in code. For example, for a model that I trained and deployed to the hub:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("phosseini/glucose-roberta-large")

model = AutoModelForMaskedLM.from_pretrained("phosseini/glucose-roberta-large")

The problem is that when the model is being loaded since there is no tokenizer named phosseini/glucose-roberta-large we will get a the following error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-4-8e9d5bcc29a4> in <module>()
      1 from transformers import AutoTokenizer, AutoModelForMaskedLM
      2 
----> 3 tokenizer = AutoTokenizer.from_pretrained("phosseini/glucose-roberta-large")
      4 
      5 model = AutoModelForMaskedLM.from_pretrained("phosseini/glucose-roberta-large")

1 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1763         if all(full_file_name is None for full_file_name in resolved_vocab_files.values()):
   1764             raise EnvironmentError(
-> 1765                 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
   1766                 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
   1767                 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "

OSError: Can't load tokenizer for 'phosseini/glucose-roberta-large'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'phosseini/glucose-roberta-large' is the correct path to a directory containing all relevant files for a RobertaTokenizerFast tokenizer.

I believe this is also causing the error on HuggingFace’s UI when we press the compute button to run the Hosted inference API.

It would be great if we can change the tokenizer’s name. For example, in this case, the model’s tokenizer is roberta-large. (Not sure if we can do it now or if I’m missing something?)

Topic		Replies	Views
“OSError: Model name './XX' was not found in tokenizers model name list” - cannot load custom tokenizer in Transformers 🤗Tokenizers	14	6902	April 25, 2023
OSError: Model name 'gpt2' was not found in tokenizers model name list (gpt2,...) 🤗Tokenizers	8	7421	August 10, 2023
What to do when HuggingFace throws "Can't load tokenizer" Models	8	51901	May 5, 2024
Issue with Loading Custom Tokenizer: Tokenizer class BaseTokenizer does not exist or is not currently imported Error 🤗Tokenizers	6	246	November 6, 2024
Can't load tokenizer for 'rukaiyaaaah/fine-tuned'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name Beginners	0	664	November 6, 2023

How to change default tokenizer name in Use in Transformers

Related topics