'Impossible to guess which tokenizer to use' while loading fine-tuned model on pipeline

I’m following the tutorial on how to fine tune bert with the pytorch trainer for a binary classification task.

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)

My code is the same as the tutorial above, just with num_labels=2. When training is finished, I’m saving the model as following:

trainer.save_pretrained('model_path')

Then I load it with

trained_model = AutoModelForSequenceClassification.from_pretrained("model_path", num_labels=2)

Following this topic for prediction, I try to load it with the pipeline as such:

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")

from transformers import pipeline
clf = pipeline("text-classification", trained_model, tokenizer)

Exception on the last line:

Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.

I’ve been trying different variations of the above, like using AutoTokenizer, or saving the tokenizer before training and then loading it after, but the error persists.

I have solved this with the following: in the trainer definition, I added the tokenizer argument:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer
)

After training I save the model: trainer.save_model("model_path")

and when loading it with the pipeline i just use the path as the model input:

clf = pipeline("text-classification", 'model_path')
1 Like