How to use the fine-tuned model for actual prediction after re-loading it

I’m trying to reload a DistilBertForSequenceClassification model I’ve fine-tuned to predict some sentences into their appropriate labels (text classification).

In google Colab, after successfully training the BERT model, I downloaded it after saving:

trainer.train()
trainer.save_model("distilbert_classification")

The downloaded model has three files: config.json, pytorch_model.bin, training_args.bin.

I moved them encased in a folder named ‘distilbert_classification’ somewhere in my google drive.

afterwards, I reloaded the model in a different Colab notebook:


newtrainer = DistilBertForSequenceClassification.from_pretrained('google drive directory/distilbert_classification')

Up to this point, I have succeeded without any errors.

However, how to I use this reloaded model (the ‘newtrainer’ object) to actually make the predictions on sentences? What is the code I need to use afterwards? I tried

newtrainer .predict("sample sentence") but it doesn’t work. Would appreciate any help!

Hi!

I found a similar topic, is this of your help?

If not, can you copy the error?

Yes, I did try this in the past. The thing is, unlike the link, I downloaded the model, stored it somewhere in my drive, and reloaded it using the google directory to model. So my code for reloading looks something like this:

new_model = DistilBertForSequenceClassification.from_pretrained("/content/gdrive/MyDrive/Colab Notebooks/BERT/BERT project/model_name")

Which did work.

Now, if I am to use what is suggested in the link:

from transformers import pipeline
clf = pipeline("text-classification", **fine_tuned_model**)
answer = clf("text)

I do not know what to put in place of the parameter “fine_tuned_model”.

I tried both


clf = pipeline("text-classification", new_model )

and


clf = pipeline("text-classification", /content/gdrive/MyDrive/Colab Notebooks/BERT/BERT project/model_name)

Both efforts do not work. It either says invalid syntax or “OSError: Can’t load tokenizer for ‘/content/gdrive/MyDrive/Colab Notebooks/BERT/BERT project/model_name’” .

So the biggest difference between my post and the link is, the other guy is saving and reloading it from the same colab notebook (which doesn’t require directory, just the model name), while I am using the google directory to reload the model from a separate colab notebook. I think that’s where my problem lies, and I can’t find the solution.

I have a similar example but for a language modeling task. I stored it to google drive, reloaded in a colab notebook and it works for me. One difference that I noticed is that in the folder where I stored my fine tuned model I have additional files for you, like tokenizer.json and tokenizer_config.json.

I think that the tokenizer that you used when fine-tuned the model has not been exported in your directory and when the pipeline tries to load it, it fails. Did you try to load the same tokenizer used in fine tuning and pass it as an argument to the pipeline? (Pipelines)

I think the code should be something like this (change the tokenizer if you used another one):

from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

new_model = DistilBertForSequenceClassification.from_pretrained("/content/gdrive/MyDrive/Colab Notebooks/BERT/BERT project/model_name")
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

clf = pipeline("text-classification", new_model, tokenizer)
...
1 Like

I see, I have to specify the tokenizer as well! It worked after that. Thank you so much!

I have just one more question. I tried the model on one sample and it prints out something like this:

[{'label': '2', 'score': 0.9861796498298645}]

does the ‘score’ here correspond to accuracy? Or is it some other metric?
Do you know what we use this for?

In the pipeline link you posted, it says

" score (float) — The probability associated to the answer."

I’m not really sure what this means. What is this score, how is it calculated? Does it have anything to do with the metric we use (like precision, recall, f1) for training?

The score is the “sigmoid” or “softmax” function; sigmoid for binary classification and softmax for multiclassification.
It’s not immediately obvious what score is, but I found two references:

  1. In the TextClassificationPipeline.__call__() method’s docstring:
function_to_apply (`str`, *optional*, defaults to `"default"`):
    The function to apply to the model outputs in order to retrieve the scores. Accepts four different
    values:
    If this argument is not specified, then it will apply the following functions according to the number
    of labels:
    - If the model has a single label, will apply the sigmoid function on the output.
    - If the model has several labels, will apply the softmax function on the output.
    Possible values are:
    - `"sigmoid"`: Applies the sigmoid function on the output.
    - `"softmax"`: Applies the softmax function on the output.
    - `"none"`: Does not apply any function on the output.
  1. In the TextClassificationPipeline docstring:

If multiple classification labels are available (model.config.num_labels >= 2), the pipeline will run a softmax over the results. If there is a single label, the pipeline will run a sigmoid over the result.

The __call__() docstring doesn’t explain which function is applied, but the TextClassificationPipeline clearly states whether it applies a sigmoid or softmax based on the number of classes in your model’s config.

2 Likes