HOW TO determine the best threshold for predictions when making inference with a finetune model?

emmakelo · December 21, 2021, 8:57am

Hello, I finetune a model but the F-score is not quite good for certains classes. To avoid a lot of false positives I decide to set a threshhold for the probabilities and I would like to know how to determine, the best threshhold ?

Should I use the mean, median , or just look at accuracy of the model on the test_data ?

nielsr · December 21, 2021, 2:09pm

Hi,

The best way to determine the threshold is to compute the true positive rate (TPR) and false positive rate (FPR) at different thresholds, and then plot the so-called ROC-curve. The ROC curve plots, for every threshold, the corresponding true positive rate and false positive rate.

Then, selecting the point (i.e. threshold) that is most to the top left of the curve will yield the best balance among the two.

Sklearn provides an implementation of this, however it’s for binary classification only. Note that there are extensions for multiclass classification.

emmakelo · December 22, 2021, 3:07pm

Thank you actually, I am doing multiclassification, I forget to mention.

charanK · December 25, 2023, 3:32pm

May I have any paper reference for this statement, as I am willing to include this statement in my paper

nielsr · December 25, 2023, 8:50pm

Hi,

This is just taught in most machine learning courses actually see for instance Classification: ROC Curve and AUC | Machine Learning | Google for Developers.

But feel free to cite me

Topic		Replies	Views
Why do probabilities output for a model does not correspond to label predicted by the finetune model? Beginners	3	1374	December 3, 2021
Flexible "Keep" threshhold for DETR model 🤗Transformers	0	225	March 22, 2022
Finetune model outputs diffrent predictions at each run ? why? 🤗Transformers	0	369	December 15, 2021
Help in picking a model/head for labelling task Beginners	0	152	June 14, 2023
Change the classifcation threshold 🤗Transformers	2	314	January 15, 2025

HOW TO determine the best threshold for predictions when making inference with a finetune model?

Related topics