I ran XLM-T (aka Twitter XLM) on my dataset of tweets using the following code:
def predict(new_text):
encoded_input = tokenizer(new_text, return_tensors='pt', padding = True, truncation = True)
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores, axis=-1)
return scores
df['scores'] = df['new text'].apply(predict)
The output for the ‘scores’ is a set of three probabilities. The first represents the probability that the tweet is negative, the second: neutral, the third: positive. For example:
[0.013780469, 0.94494355, 0.041276094]
FYI: The data type is an object.
I’m wondering how I can convert this into a single class (e.g., ‘Negative’). My idea was to separate the score into three new columns and then create a column called ‘Class’ based on which of the three new columns in largest. However, I don’t know how to go about doing that or if there is another better option.