Why do probabilities output for a model does not correspond to label predicted by the finetune model?

slight_smile:

Hello, I finetune a model from huggigface on a classification task : a multiclassification with 3 labels encoded as : 0,1, and 2. I use the crossentropy loss function for the computing of the loss .

When training I tried to get the probabilities but I observe that the probabilities does not correspond to the final label of the classification model. For industrial purpose I, need to set a threshold of probabilities so that not all the text given to the model which are classified are returned. But since the probabilities does not correspond to the label , how can I intepret them.
For industrail purpose I, need to get the right probabilities in order to introduce a threshold for what is returned after the classification is done.

For the pobabilities I used this code line : proba = nn.functional.softmax(logits, dim=1)

probabilities  + label 
        [ 0.1701, 0.4728, 0.3571], => 1
        [0.2768, 0.4665, 0.2567], =>  1
        [0.2286, 0.5702, 0.2012], => 1
        **[0.2479, 0.5934, 0.1587], => 2**
        **[0.2212, 0.5519, 0.2270], => 2**
        [0.2169, 0.5404, 0.2428], => 1
        [0.1706, 0.6370, 0.1924], => 1
        [0.1836, 0.6960, 0.1203]] => 1

As see above, the predicted label for the line with *** are 2 but I do not get why, I thought by observing, it will be 1. Maybe it is me who does not understand. I put the original logits which I converted to probabilities. For the classification model I used Flaubertsequenceclassification Class.

Logits :

[-0.67542565  0.34714806  0.06658715]
 [-0.1786863   0.3430867  -0.25426903]
 [-0.2919644   0.6223039  -0.41944826]
 **[-0.25066078  0.62209827 -0.69668627]**
** [-0.5443676   0.37007216 -0.51845074]**
 [-0.5634354   0.34945157 -0.45065987]
 [-0.7058248   0.6116817  -0.58579236]
 [-0.7987261   0.5336867  -1.2213029 ]

If you have any idea !!!

A snippet of the model Class :slight_smile:

		# extract the hidden representations from the encoder output
		hidden_state = encoder_output[0]  # (bs, seq_len, dim)
		pooled_output = hidden_state[:, 0]  # (bs, dim)
		# apply dropout
		pooled_output = self.dropout(pooled_output)  # (bs, dim)
		# feed into the classifier
		logits = self.classifier(pooled_output)  # (bs, dim)
		
		proba = nn.functional.softmax(logits, dim=1)
		#print(type(proba))
		print(proba)

		#outputs = (probabilities,) + encoder_output[1:] # logits
		
		outputs = (logits,) + encoder_output[1:] # logits
		
		if labels is not None:
			
			#multiclassification
			loss_fct = torch.nn.CrossEntropyLoss()  #crossEntropyLoss
			loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
			# aggregate outputs
			outputs = (loss, ) + outputs
			# print(outputs)
			
			
			

		return   outputs # (loss), logits, (hidden_states), (attentions)

From your probabilities, it looks like the predicted labels should be 1 as you expected, since it’s the highest probability.

How are you generating 2 as the predicted label?

Hello, thank you, Actually I was printing the initial prediction and not the prediction from the label that’s why It was changing I return the right variable and now the logits functions. I have a question can directly transform the logits to probabilities after the classification is done on the test so that I can set a threshold

                test_dataset = process_and_tokenize_file(input, input_label)
		raw_pred, _, _ = trainer.predict(test_dataset)
		print("raw :", raw_pred)
		# Preprocess raw predictions
		y_pred = np.argmax(raw_pred, axis=1)
		print("true label :", y_pred)
		
		proba = nn.functional.softmax(raw_pred, dim=-1) #raw_pred = logits from what it is return from the self.classifier 
		
		

You have already directly transformed the logits to probabilities using the softmax. To set a threshold you could do something like:

threshold = 0.6
# Get the max probability and the index of the highest prediction for each output.
max_probabilities, max_indices = torch.max(proba, dim=1)
# Get an array of true if the probability is above the threshold, and false otherwise.
above_threshold = max_probabilities > threshold
# Get the predicted label if the probability is greater than the threshold and None otherwise.
results = [predicted_label.item() if above_threshold[index] else None for index, predicted_label in enumerate(max_indices)]



1 Like