Easiest way to perform inference

I trained, evaluate, and saved my model: model_bert.save_pretrained("fine_tuned_model").
Now I want to perform inference so I load the using pipeline: clf = pipeline("text-classification", "/content/fine_tuned_model") .

Then I pass an array of text to the model: clf(tx,return_all_scores=True)

Then I get this result:

Disabling tokenizer parallelism, we're using DataLoader multithreading already
[[{'label': 'LABEL_0', 'score': 0.9765037894248962},
  {'label': 'LABEL_1', 'score': 0.30014175176620483},
  {'label': 'LABEL_2', 'score': 0.9280667901039124},
  {'label': 'LABEL_3', 'score': 0.06726877391338348},
  {'label': 'LABEL_4', 'score': 0.8652555346488953},
  {'label': 'LABEL_5', 'score': 0.15145337581634521}],
 [{'label': 'LABEL_0', 'score': 0.8798120021820068},
  {'label': 'LABEL_1', 'score': 0.006957885809242725},
  {'label': 'LABEL_2', 'score': 0.06591048091650009},
  {'label': 'LABEL_3', 'score': 0.0038158840034157038},
  {'label': 'LABEL_4', 'score': 0.06268540769815445},
  {'label': 'LABEL_5', 'score': 0.04680700972676277}]]

My questions are:

  • Is possible to get the actual labels and not LABEL_0?
  • And also wants the best way of evaluating your model when you have a csv file of text, is there a special function or?

You see these “generic” label names because you didn’t specify the correct ones when fine-tuning the model. If you check your model’s configuration like this…

from transformers import AutoModelForTextClassification
m = AutoModelForSequenceClassification.from_pretrained("/path/to/fine_tuned_model")

You should get something like:

{0: ‘LABEL_0’, 1: ‘LABEL_1’, 2: ‘LABEL_2’, 3: ‘LABEL_3’, 4: ‘LABEL_4’, 5: ‘LABEL_5’}

This is what maps the label ids (0,1,2…) to actual names, so you can simply modify this dictionary to have the label names that you want. For example:

m.config.id2label = {0: 'zero', 1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

If you perform inference now you’ll see the new label names (zero, one, two…) in the output. You can change the “label2id” dictionary in the same way.

And about the CSV question, I don’t think there’s a special function for that… just read the file (into a dataset object, a pandas dataframe or whatever you prefer) and iteratively provide texts to your pipeline, either one by one or in batches.

