'Impossible to guess which tokenizer to use' while loading fine-tuned model on pipeline

bishop12 · December 7, 2023, 1:22pm

I’m following the tutorial on how to fine tune bert with the pytorch trainer for a binary classification task.

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)

My code is the same as the tutorial above, just with num_labels=2. When training is finished, I’m saving the model as following:

trainer.save_pretrained('model_path')

Then I load it with

trained_model = AutoModelForSequenceClassification.from_pretrained("model_path", num_labels=2)

Following this topic for prediction, I try to load it with the pipeline as such:

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")

from transformers import pipeline
clf = pipeline("text-classification", trained_model, tokenizer)

Exception on the last line:

Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.

I’ve been trying different variations of the above, like using AutoTokenizer, or saving the tokenizer before training and then loading it after, but the error persists.

bishop12 · December 7, 2023, 2:33pm

I have solved this with the following: in the trainer definition, I added the tokenizer argument:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer
)

After training I save the model: trainer.save_model("model_path")

and when loading it with the pipeline i just use the path as the model input:

clf = pipeline("text-classification", 'model_path')

Topic		Replies	Views
How to use the model from the chapter "Fine-tuning a model with the Trainer API" Course	0	322	April 17, 2024
How to use pipeline for 'token-classification' with already tokenized input? Beginners	0	692	February 3, 2022
Can't load pre-trained tokenizer with additional new tokens 🤗Transformers	3	4426	August 10, 2021
Load pretrained model's tokenizer with or without vocabulary? Beginners	2	152	August 30, 2024
Fine-tuning BERT Model on domain specific language and for classification 🤗Transformers	7	8429	November 14, 2024

'Impossible to guess which tokenizer to use' while loading fine-tuned model on pipeline

Related topics