How to get probabilities per label in finetuning classification task?

Hello, I foloow the huggingface web site to finetune FlauBert for classification task. What I would like to know is how to get probabilities for the classification. Something like this [0.75,0.85,0.25], because I have 3 classes, so far when priinting the results I get this : but it seems to correspond to the logits and not the probabilities ? Furthermore , they contains negative numbers. I thought probabilities were positive numbers between [0>1].

PredictionOutput(predictions=array([[ 0.53947556,  0.42591393, -0.8021714 ],
       [ 1.6963196 , -3.3902004 ,  1.8755357 ],
       [ 1.9264233 , -0.35482746, -2.339029  ],
       ...,
       [ 2.8833866 , -1.1608589 , -1.2109699 ],
       [ 1.1803235 , -1.4036949 ,  0.48559391],
       [ 1.9253297 , -1.0417538 , -1.2987505 ]], dtype=float32), label_ids=array([0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 1, 0, 2, 0, 0, 2, 0, 0, 1, 0, 1, 2, 2, 2, 1, 2, 0, 0,
       0, 2, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 1, 0, 0, 0, 0, 1, 0, 1,
       1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 2, 0, 2, 1, 2, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 0, 2, 0, 0, 0, 0, 0, 1, 2, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0,
       0, 1, 2, 1, 1, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
       1, 1, 0, 2, 0, 1, 1, 1, 1, 0, 0, 0, 2, 2, 0, 0, 1, 1, 2, 1, 1, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 2, 0,
       2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 2, 0, 0, 1, 0, 0, 2, 0,
       2, 2, 0, 0, 2, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 0, 1, 2, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1,
       0, 0, 1, 1, 0, 0, 0, 1, 2, 0, 0, 2, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 2, 2, 1, 1, 2, 0, 2, 1, 1, 1, 0, 2, 0, 0, 0, 2, 2, 0,
       1, 1, 1, 1, 1, 0, 0, 1, 2, 0, 0, 0, 1, 0, 1, 1, 2, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 1, 2, 0, 1, 0, 1, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 2, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 1, 1, 0,
       1, 0, 0, 1, 0, 1, 2, 2, 0, 1, 1, 0, 2, 1, 0, 0, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 0, 0, 2, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 2, 0, 0, 0,
       1, 0, 2, 2, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 2, 0, 0,
       0, 0, 1, 2, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 2, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 2, 0, 1, 0, 1, 0, 0, 2, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 2, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 1, 2, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 0, 0, 1, 0, 2, 0, 0,
       1, 2, 0, 1, 0, 0, 1, 1, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 1, 0, 0, 2,
       0, 1, 0, 1, 2, 0, 0, 1, 0, 0]), metrics={'test_loss': 1.164217233657837, 'test_accuracy': 0.565028901734104, 'test_f1_mi': 0.565028901734104, 'test_f1_ma': 0.42953547487160565, 'test_runtime': 1.4322, 'test_samples_per_second': 483.16, 'test_steps_per_second': 7.68})

```


Code for getting this results is adapted from the notebook for finetuning for task classification :slight_smile: 
```


PRE_TRAINED_MODEL_NAME = '/gpfswork/rech/kpf/umg16uw/expe_5/model/sm'

class FlauBertForSequenceClassification(FlaubertModel):
	"""
	FlauBert Model for Classification Tasks.

	"""
	def __init__(self, config, num_labels, freeze_encoder=False):

		"""
		@param    FlauBert: a FlauBertModel object
		@param    classifier: a torch.nn.Module classifier
		@param    freeze_encoder (bool): Set `False` to fine-tune the FlauBERT model
		
		"""

		# instantiate the parent class FlaubertModel
		super().__init__(config)
		
		# Specify hidden size of FB hidden size of our classifier, and number of labels

		# instantiate num. of classes
		self.num_labels = num_labels
		
		# instantiate and load a pretrained FlaubertModel 
		self.encoder = FlaubertModel.from_pretrained(PRE_TRAINED_MODEL_NAME)
		

		
		# freeze the encoder parameters if required (Q1)
		if freeze_encoder: 
		  for param in self.encoder.parameters():
			  param.requires_grad = False

		# the classifier: a feed-forward layer attached to the encoder's head
		self.classifier = torch.nn.Sequential(
											  torch.nn.Linear(in_features=config.emb_dim, out_features=512),
											  torch.nn.Tanh(),  # or nn.ReLU()
											  torch.nn.Dropout(p=0.1), 
											  torch.nn.Linear(in_features=512, out_features=self.num_labels, bias=True),
											  )
		# instantiate a dropout function for the classifier's input
		self.dropout = torch.nn.Dropout(p=0.1)


	def forward(
		self,
		input_ids=None,
		attention_mask=None,
		head_mask=None,
		inputs_embeds=None,
		labels=None,
		output_attentions=None,
		output_hidden_states=None,
	):
		# encode a batch of sequences
		encoder_output = self.encoder(
			input_ids=input_ids,
			attention_mask=attention_mask,
			head_mask=head_mask,
			inputs_embeds=inputs_embeds,
			output_attentions=output_attentions,
			output_hidden_states=output_hidden_states,
		)
		# extract the hidden representations from the encoder output
		hidden_state = encoder_output[0]  # (bs, seq_len, dim)
		pooled_output = hidden_state[:, 0]  # (bs, dim)
		# apply dropout
		pooled_output = self.dropout(pooled_output)  # (bs, dim)
		# feed into the classifier
		logits = self.classifier(pooled_output)  # (bs, dim)

		outputs = (logits,) + encoder_output[1:]
		
		if labels is not None:
			#multiclassification
			loss_fct = torch.nn.CrossEntropyLoss()  #crossEntropyLoss
			loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
			outputs = (loss,) + outputs

		return outputs  # (loss), logits, (hidden_states), (attentions)

model = FlauBertForSequenceClassification(
    config=model.config, num_labels=3, freeze_encoder = False
    )


training_args = TrainingArguments(
    output_dir='/gpfswork/rech/kpf/umg16uw/results_hf/sm',          
    logging_dir='/gpfswork/rech/kpf/umg16uw/logs/sm',
	do_train=True,
    do_eval=True,
	evaluation_strategy="steps",
	logging_first_step=True,
    logging_steps=10,
    num_train_epochs=3.0,              
    per_device_train_batch_size=16,
	per_device_eval_batch_size=16,	
    learning_rate=2e-5,
    weight_decay=0.01
)

trainer = Trainer(
	model=model,                         
	args=training_args,                  
	train_dataset = process_and_tokenize_file(X_train, y_train),
	eval_dataset = process_and_tokenize_file(X_val, y_val), 	
	compute_metrics=compute_metrics
	)

	# Train pre-trained model

	# Start training loop
	print("Start training...\n") 

	train_results = trainer.train()
	val_results = trainer.evaluate()


for root, subdirs, files in os.walk(test_dir):
		#print(root,"...")
		#print(files,"...")
		for f in files:
			path_file = os.path.join(root, f)
			input, input_label = input_file(path_file)
			test_dataset = process_and_tokenize_file(input, input_label)
			test_results = trainer.predict(test_dataset)
			print(test_results)  # give the results above

```
1 Like

Hi,

What models in the Transformers library output are called logits (they are called predictions in your case), these are the unnormalized scores for each class, for every example in a batch. You can turn them into probabilities by applying a softmax operation on the last dimension, like so:

import tensorflow as tf

probabilities = tf.math.softmax(predictions, axis=-1)
print(probabilities)
1 Like

Hi, thank you

But is it possible to use tensorflow with pytorch ? I used pytorch for the finetuning task ?

Can I modified the code you proposed in torch ?

Oh apologies, I’m also using PyTorch, for some reason I thought you were using Tensorflow.

In PyTorch, it can be done as follows:

from torch import nn

probabilities = nn.functional.softmax(predictions, dim=-1)
print(probabilities)
2 Likes

Hey,

I have nearly the same output like emmakelo and have a predictions array including the logits and the label. When I use your pytorch function I get the following error “AttributeError: ‘numpy.ndarray’ object has no attribute ‘softmax’”. Do you know how to tackle this? Thanks in advance :slight_smile:

You have to use softmax function

                    #print(test_results)
					labels_id = test_results.label_ids
					#print("Origine---- :" , labels_id)
					y_pred_pt = torch.from_numpy(test_results.predictions)
					probs = nn.functional.softmax(y_pred_pt, dim=-1) #raw_pred = logits from what it is return from the self.classifier
					#print(probs)
					# Get the max probability and the index of the highest prediction for each output.
					max_probabilities, max_indices = torch.max(probs, dim=1)
					print(type(max_probabilities))
					predicted_labels = test_results.predictions.argmax(-1)