Hello, I finetune a model from huggigface on a classification task : a multiclassification with 3 labels encoded as : 0,1, and 2. I use the crossentropy loss function for the computing of the loss .
When training I tried to get the probabilities but I observe that the probabilities does not correspond to the final label of the classification model. For industrial purpose I, need to set a threshold of probabilities so that not all the text given to the model which are classified are returned. But since the probabilities does not correspond to the label , how can I intepret them.
For industrail purpose I, need to get the right probabilities in order to introduce a threshold for what is returned after the classification is done.
For the pobabilities I used this code line : proba = nn.functional.softmax(logits, dim=1)
probabilities + label [ 0.1701, 0.4728, 0.3571], => 1 [0.2768, 0.4665, 0.2567], => 1 [0.2286, 0.5702, 0.2012], => 1 **[0.2479, 0.5934, 0.1587], => 2** **[0.2212, 0.5519, 0.2270], => 2** [0.2169, 0.5404, 0.2428], => 1 [0.1706, 0.6370, 0.1924], => 1 [0.1836, 0.6960, 0.1203]] => 1
As see above, the predicted label for the line with *** are 2 but I do not get why, I thought by observing, it will be 1. Maybe it is me who does not understand. I put the original logits which I converted to probabilities. For the classification model I used Flaubertsequenceclassification Class.
[-0.67542565 0.34714806 0.06658715] [-0.1786863 0.3430867 -0.25426903] [-0.2919644 0.6223039 -0.41944826] **[-0.25066078 0.62209827 -0.69668627]** ** [-0.5443676 0.37007216 -0.51845074]** [-0.5634354 0.34945157 -0.45065987] [-0.7058248 0.6116817 -0.58579236] [-0.7987261 0.5336867 -1.2213029 ]
If you have any idea !!!
A snippet of the model Class
# extract the hidden representations from the encoder output hidden_state = encoder_output # (bs, seq_len, dim) pooled_output = hidden_state[:, 0] # (bs, dim) # apply dropout pooled_output = self.dropout(pooled_output) # (bs, dim) # feed into the classifier logits = self.classifier(pooled_output) # (bs, dim) proba = nn.functional.softmax(logits, dim=1) #print(type(proba)) print(proba) #outputs = (probabilities,) + encoder_output[1:] # logits outputs = (logits,) + encoder_output[1:] # logits if labels is not None: #multiclassification loss_fct = torch.nn.CrossEntropyLoss() #crossEntropyLoss loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) # aggregate outputs outputs = (loss, ) + outputs # print(outputs) return outputs # (loss), logits, (hidden_states), (attentions)