I have used BERT to classify the Accident Levels.
When trained and inferred on the test dataset, trainer.predict() gives 0 misclassifications with 100% F1 Score.
However when I use the same model to predict the accident level of a single accident description it always outputs accident level as 1.
The Accident Level variable has class imbalance and 1 is the majority class.
If Train.predict() gives 0 misclassifications then why model.predict() gives wrong accident levels. Please note that I am using same tokenizer and model which I trained on my custom data
1 Like
Hmmm… By Hugging Chat.
The issue you’re encountering can be attributed to both overfitting and class imbalance [1][2].
Overfitting
Even though trainer.predict()
gives 0 misclassifications with a 100% F1 Score on the test dataset, this could indicate overfitting. Overfitting occurs when a model learns the training data too well, including its noise and outliers, and performs poorly on unseen data. The scenario where the test accuracy is slightly less than the training accuracy (as mentioned in Source [1]) is a common sign. However, in your case, the test accuracy is the same as the training accuracy, which is suspicious and often indicates overfitting.
Class Imbalance
The class imbalance issue exacerbates the problem. Since class 1 is the majority class, the model might be biased towards predicting class 1 more often, especially when it is overfitted to the training data. This is a common problem in imbalanced datasets where the model tends to predict the majority class to maximize its accuracy [2].
Potential Solutions
- Stratified Splitting: Ensure that your train-test split is stratified to maintain the class distribution in both sets.
- Threshold Adjustment: Since the default probability threshold of 0.5 might not be suitable for imbalanced datasets, consider adjusting the threshold to improve the precision and recall for the minority class [2].
- Cost-Sensitive Learning: Use the
class_weight
parameter in your model to penalize misclassification of the minority class more heavily [2].
- Alternative Metrics: Instead of relying solely on accuracy, use other metrics like precision, recall, F1 Score, and the confusion matrix to evaluate your model’s performance [2].
- Regularization: Apply regularization techniques to reduce overfitting and improve the model’s generalization to unseen data.
- Cross-Validation: Use cross-validation to ensure that your model’s performance is consistent across different subsets of the data.
By addressing these issues, you should be able to improve the model’s performance on both the test dataset and individual predictions.
While training the BERT I split my data into train, validation and test data. So my test data is basically unseen by my model because to evaluate my model while training i used validation dataset.
2 Likes