How to test my text classification model after training it?

anon58275033 · June 10, 2021, 10:39am

Hello,

I have followed this tutorial on text classification: notebooks/text_classification.ipynb at master · huggingface/notebooks · GitHub

Now, I have trained it using my own data, but I am unsure how to actually deploy it to carry out a classification task.

For example, I want to input the following sentence: “You look good today.” And, from there, I want to see if the classification is positive or negative, but I do not know how to do this.

Any help is much appreciated.

nielsr · June 10, 2021, 12:50pm

That’s a good question. cc @sgugger, would be great if the several notebooks also include an inference part. I had to look into several notebooks before finding out you can access the trained model using trainer.model. Here’s how to do inference on a new, unseen sentence:

sentence = “You look good today.” 
# encode sentence (i.e. create input_ids, attention_mask)
encoding = tokenizer(sentence)
# make sure the keys of the "encoding" dict are on the same device as the model
encoding = {k: v.to(trainer.args.device) for k, v in encoding.items()}
# forward pass through the model
with torch.no_grad():
    outputs = trainer.model(**encoding)
logits = outputs.logits 
print("Predicted class index:", logits.argmax(-1))

The predicted class index will be either a zero or a one (I guess one represents positive).

anon58275033 · June 10, 2021, 1:15pm

Hello, @nielsr - thank you very much for that solution; I will try that with my notebook. I think the notebooks are great, but, like you said, it would be great if they included an interface part, deploying the trained model. Also, while you are here, are you able to answer my other question? Please see this link: [HELP] How to include emojis in masked language modelling?

sgugger · June 10, 2021, 8:05pm

Once you have done trainer.save_model(any_folder), you can access your model either:

using the pipeline API
or re-instantiate it with model.from_pretrained
Just use the name of your local folder as an identifier.

If you do trainer.push_to_hub(a_repo_name) you can do the same from any machine, and even use the widget inference API on the model hub.

anon58275033 · June 10, 2021, 8:17pm

Hi, @sgugger - thanks for that. Would it be possible for you to take a look at this issue I am having? Here is the link: [HELP] How to include emojis in masked language modelling?

I understand that I need to add custom tokens, which are emojis in my case, but I do not know how to do it. I believe that I need to use this code somewhere:

# Let's see how to increase the vocabulary of Bert model and tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

num_added_toks = tokenizer.add_tokens(['🤗'])
print('We have added', num_added_toks, 'tokens')
 # Notice: resize_token_embeddings expect to receive the full size of the new vocabulary, i.e., the length of the tokenizer.
model.resize_token_embeddings(len(tokenizer))

Topic		Replies	Views
I have trained my classifier, now how do I do predictions? Beginners	7	33587	February 14, 2021
Pretrain model to classify text as yes, no, not sure Models	3	464	December 3, 2020
Predict the output of a text - Sentiment Analysis Models	2	430	July 2, 2022
Negative test in Text classification Beginners	0	254	April 11, 2022
Load and test a locally pretrained model on the text regression Beginners	0	200	January 6, 2023

How to test my text classification model after training it?

Related Topics