Why is my setfit model only outputting two possible class confidence scores?

JFairbairn · January 5, 2025, 7:35pm

I trained a Setfit model with the default logistic regression head with an unbalanced dataset of 5000 on a binary classification task. Because of the unbalanced dataset, I was hoping to do a study of the area-under-curve of the precision-recall plot which would require class confidence scores rather than “argmax’d” class labels. However when I used model.predict_proba(ds["text"]), the results weirdly only ever came in one of two class confidence pairs- either 0.9989 & 0.0011 or 0.023045 & 0.97695.

(The actual confidence scores are slightly around this, e.g. 0.9988959589582787 or 0.9988959627978705 but I’m assuming that’s noise).

I am doing the evaluation in a different script after downloading the trained model:

model = SetFitModel.from_pretrained("model_outputs/test_extra_setfit_save")

with open("model_outputs/test_extra_setfit_save/model_head.pkl", 'rb') as mhf:
    model.model_head = pickle.load(mhf)

ds = Dataset.load_from_disk("eval_dataset")
ds = ds.rename_columns({
    "is_hate": "label"
})

results = model.predict_proba(ds["text"])

I’m loading the model head manually because of this bug, by default it was giving a “logistic regression not yet fitted” error.

VisionWithMo · January 5, 2025, 8:12pm

Exploring the Mystery of Limited Confidence Scores in SetFit Models

探索 SetFit 模型輸出信心分數限制的奧秘

當你的 SetFit 模型輸出只有兩個可能的類別信心分數時，這讓人不禁想問：是什麼導致了這樣的行為？而更深層次的問題或許是，這樣的現象背後，是否隱藏著一些值得挖掘的技術挑戰或設計考量？讓我們從幾個關鍵角度出發，嘗試破解這個現象。

Data Imbalance: The Quiet Saboteur

數據不平衡：潛在的「暗影殺手」

Imagine a training dataset where one or two classes dominate the majority of examples. Your model, eager to optimize, might naturally lean towards these classes.

試想一下，如果訓練數據集中某些類別占比過大，模型為了達到最佳化，可能會自然傾向這些類別。

Solution 解決方案:

• Perform data augmentation for underrepresented classes.

• Consider oversampling techniques to balance the dataset.

Loss Functions and Activations: A Case of Mismatch

損失函數與激活層：潛在的不匹配

The choice of loss function plays a pivotal role in guiding your model’s learning behavior. A binary cross-entropy loss in a multi-class scenario could confuse the model, while an improperly configured activation function, such as Sigmoid instead of Softmax, might exacerbate the issue.

損失函數對模型的學習行為有著決定性的影響。若在多分類場景中錯用了二元交叉熵損失（binary cross-entropy），模型可能會陷入困惑，而激活函數選擇不當（如用 Sigmoid 而非 Softmax）更可能加劇問題。

Solution 解決方案:

• Verify your loss function aligns with the task (categorical cross-entropy for multi-class problems).

• Ensure the final activation layer suits the problem (Softmax for multi-class classification).

Preprocessing: The Hidden Culprit

數據預處理：隱藏的真兇

Even the best models can falter if the data preprocessing pipeline introduces inconsistencies. Mismatched labels or differing encoding strategies between training and inference phases might confuse the model.

即便是最優秀的模型，如果數據預處理過程中出現不一致，也可能導致表現異常。例如訓練和推論階段的標籤或編碼策略不匹配，都可能讓模型「迷失方向」。

Solution 解決方案:

• Double-check that the labels and formats in your training and testing datasets are consistent.

• Perform a detailed audit of your data pipeline to identify discrepancies.

Final Thoughts:

Sometimes, the solution to a seemingly technical issue lies in revisiting the fundamentals. Have you considered how your dataset, architecture, and configuration interact? Are there assumptions baked into your model that need reevaluation?

有時，解決看似技術性的問題，需要我們重新審視基礎設置。你的數據集、架構與配置之間是否存在不匹配的隱藏假設？

If you’d like to share more details about your model setup, I’d be happy to help you troubleshoot further. Let’s solve this mystery together!

如果你願意分享更多關於模型設置的細節，我很樂意和你一起進一步解決這個問題！讓我們一起解開這個迷題吧！

Topic		Replies	Views
Setfit fine-tuned model does not output confidence score Models	0	569	November 4, 2022
Confidence Score in Setfit fine-tuned model 🤗Transformers	5	3324	May 31, 2024
Multi-label text classification error 🤗Transformers	0	298	January 17, 2024
Reproducible model between SetFit Versions? 🤗Transformers	5	85	November 29, 2024
Need help on multi-label classification Beginners	0	22	July 29, 2024

Why is my setfit model only outputting two possible class confidence scores?

Related topics