I’m following the tutorial on how to fine tune bert with the pytorch trainer for a binary classification task.
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)
My code is the same as the tutorial above, just with num_labels=2
. When training is finished, I’m saving the model as following:
trainer.save_pretrained('model_path')
Then I load it with
trained_model = AutoModelForSequenceClassification.from_pretrained("model_path", num_labels=2)
Following this topic for prediction, I try to load it with the pipeline as such:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
from transformers import pipeline
clf = pipeline("text-classification", trained_model, tokenizer)
Exception on the last line:
Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.
I’ve been trying different variations of the above, like using AutoTokenizer
, or saving the tokenizer before training and then loading it after, but the error persists.