How to test my text classification model after training it?


I have followed this tutorial on text classification: notebooks/text_classification.ipynb at master · huggingface/notebooks · GitHub

Now, I have trained it using my own data, but I am unsure how to actually deploy it to carry out a classification task.

For example, I want to input the following sentence: “You look good today.” And, from there, I want to see if the classification is positive or negative, but I do not know how to do this.

Any help is much appreciated.

That’s a good question. cc @sgugger, would be great if the several notebooks also include an inference part. I had to look into several notebooks before finding out you can access the trained model using trainer.model. Here’s how to do inference on a new, unseen sentence:

sentence = “You look good today.” 
# encode sentence (i.e. create input_ids, attention_mask)
encoding = tokenizer(sentence)
# make sure the keys of the "encoding" dict are on the same device as the model
encoding = {k: for k, v in encoding.items()}
# forward pass through the model
with torch.no_grad():
    outputs = trainer.model(**encoding)
logits = outputs.logits 
print("Predicted class index:", logits.argmax(-1))

The predicted class index will be either a zero or a one (I guess one represents positive).

1 Like

Hello, @nielsr - thank you very much for that solution; I will try that with my notebook. I think the notebooks are great, but, like you said, it would be great if they included an interface part, deploying the trained model. Also, while you are here, are you able to answer my other question? Please see this link: [HELP] How to include emojis in masked language modelling?

Once you have done trainer.save_model(any_folder), you can access your model either:

  • using the pipeline API
  • or re-instantiate it with model.from_pretrained
    Just use the name of your local folder as an identifier.

If you do trainer.push_to_hub(a_repo_name) you can do the same from any machine, and even use the widget inference API on the model hub.

Hi, @sgugger - thanks for that. Would it be possible for you to take a look at this issue I am having? Here is the link: [HELP] How to include emojis in masked language modelling?

I understand that I need to add custom tokens, which are emojis in my case, but I do not know how to do it. I believe that I need to use this code somewhere:

# Let's see how to increase the vocabulary of Bert model and tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

num_added_toks = tokenizer.add_tokens(['🤗'])
print('We have added', num_added_toks, 'tokens')
 # Notice: resize_token_embeddings expect to receive the full size of the new vocabulary, i.e., the length of the tokenizer.