How to get XLM-T classification output from the scores?

aandyrea · January 2, 2023, 1:56pm

I ran XLM-T (aka Twitter XLM) on my dataset of tweets using the following code:

def predict(new_text):
  encoded_input = tokenizer(new_text, return_tensors='pt', padding = True, truncation = True)
  output = model(**encoded_input)
  scores = output[0][0].detach().numpy()
  scores = softmax(scores, axis=-1)
  return scores

df['scores'] = df['new text'].apply(predict)

The output for the ‘scores’ is a set of three probabilities. The first represents the probability that the tweet is negative, the second: neutral, the third: positive. For example:

[0.013780469, 0.94494355, 0.041276094]

FYI: The data type is an object.

I’m wondering how I can convert this into a single class (e.g., ‘Negative’). My idea was to separate the score into three new columns and then create a column called ‘Class’ based on which of the three new columns in largest. However, I don’t know how to go about doing that or if there is another better option.

Topic		Replies	Views
Predict the output of a text - Sentiment Analysis Models	2	526	July 2, 2022
XLM-R classifier predictions produce errors Beginners	2	662	June 25, 2021
Different sentiments when texts processed in batches vs singles Intermediate	1	447	July 3, 2022
Getting outputs of mode.predict() per sentence input Models	3	2435	June 21, 2021
T5: classification using text2text? 🤗Transformers	18	21206	January 23, 2024

How to get XLM-T classification output from the scores?

Related topics