Tokenizer is not being loaded on Huggingface Inference

ritwikm · September 22, 2022, 7:30pm

I was using this to finetune a GPT medium.

tokenizer = GPT2Tokenizer.from_pretrained('gpt2', bos_token='<|startoftext|>', eos_token='<|endoftext|>', pad_token='<|pad|>') #gpt2-medium
configuration = GPT2Config.from_pretrained('gpt2', output_hidden_states=False)
model = GPT2LMHeadModel.from_pretrained("gpt2", config=configuration)
model.resize_token_embeddings(len(tokenizer))

# after finetuning

model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

But when I upload the model and tokenizer on the model hub (ritwikm/gandhi-gpt), I see the following error

Can't load tokenizer using from_pretrained, please update its configuration: No such file or directory (os error 2)

On my local machine, I am loading the same tokenizer and model using the following lines:

model = model.from_pretrained(output_dir).to(device)
tokenizer = tokenizer.from_pretrained(output_dir)

And it works fine. It just fails on the inference.

I have tried many solutions like using only those tokenizer files which are available in the official repo (gpt2).

I saw the post by bala1802, even that didn’t help.

What can be done?

Topic		Replies	Views
Tokenizer issue in Huggingface Inference on uploaded models Beginners	7	3061	January 9, 2024
Can't load tokenizer using from_pretrained, Inference API 🤗Tokenizers	4	1802	May 6, 2024
Cant load tokenizer using from_pretrained, `use_auth_token=True` error when token is being used Inference Endpoints on the Hub	7	7668	August 6, 2023
Can't load tokenizer for 'rukaiyaaaah/fine-tuned'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name Beginners	0	658	November 6, 2023
What to do when HuggingFace throws "Can't load tokenizer" Models	8	51674	May 5, 2024

Tokenizer is not being loaded on Huggingface Inference

Related topics