Model Inference API error

dilolo · May 5, 2024, 2:00pm

Hi
I used XLM-RoBERTa for fine tuning this model to determine the text language. After training, I uploaded the model to the huggingface repository. When trying to use the Inference API on a model page for a model test I get an error
Can’t load tokenizer using from_pretrained, please update its configuration: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')

def tokenize_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True, max_length=128)

# Apply tokenization to each split
dataset = dataset.map(tokenize_function, batched=True)

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained('xlm-roberta-base', num_labels=21)

from transformers import Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

training_args = TrainingArguments(
    output_dir='./results',          
    num_train_epochs=3,              
    per_device_train_batch_size=16,  
    per_device_eval_batch_size=64,   
    warmup_steps=500,                
    weight_decay=0.01,               
    logging_dir='./logs',            
    logging_steps=10,
    evaluation_strategy="epoch",
    save_strategy="epoch"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['validation'],
    compute_metrics=compute_metrics
)

trainer.train()


from huggingface_hub import notebook_login

notebook_login()

repo_name = "my_repo"

model.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

Why could this be?

nielsr · May 6, 2024, 9:02am

Thanks for reporting, will ping the team.

Looks like the same issue as Can't load tokenizer using from_pretrained, Inference API

asif00 · May 6, 2024, 2:03pm

I am also facing the same issue. It still doesn’t work on Model’s default Inference API but I could make it work with MBart50Tokenizer on HF space. Here is the Space I was testing it.

dilolo · May 6, 2024, 3:02pm

I also tried running it through space. I have an error there too

asif00 · May 6, 2024, 3:16pm

Could you please try the XLMRobertaTokenizer tokenizer instead of AutoTokenizer and let me know?

dilolo · May 6, 2024, 5:50pm

@asif00 Asif thank you very much. I really should have used the XLMRobertaTokenizer
Thanks

asif00 · May 6, 2024, 6:10pm

You are welcome!!!

system · May 7, 2024, 6:10am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't load tokenizer using from_pretrained, Inference API 🤗Tokenizers	4	1802	May 6, 2024
Huggingface inference API issue 🤗Tokenizers	0	509	January 10, 2023
OSError: Can't load tokenizer for 'facebook/xmod-base' 🤗Tokenizers	1	1226	October 6, 2023
Tokenizer is not being loaded on Huggingface Inference 🤗Tokenizers	0	986	September 22, 2022
Inference API Web Widget for tons of public models: Can't load tokenizer using from_pretrained Models	0	350	December 12, 2023

Model Inference API error

Related topics