Need to verify results of transformer models by running cross-validation or statistical methods? If so, how?

wiorz · May 27, 2023, 9:48pm

I see some papers that run K-Fold cross validation on their models’ results using transformers like BERT, but I’m not sure how to do it in HF, or if we need to do it.
Say I’m trying to check if there’s a real difference between the performance metrics of two different models, say Model SA and SB, then how do I make sure the performance difference is significant and it’s not due to random chance?
I suppose I can make a global variable that holds the metrics and append the results to them during the compute_metrics() call. But then what do I calculate the CV score? Most applications I’ve seen are using sklearn models, which have a direct .score() call to the model. Most guides I see something like:

from sklearn.model_selection import cross_validate
_scoring = ['accuracy', 'precision', 'recall', 'f1']
results = cross_validate(estimator=model,
                                   X=_X,
                                   y=_y,
                                   cv=_cv,
                                   scoring=_scoring,
                                   return_train_score=True)

Which I don’t think we can do that with HF models.

So if we cannot do CV, then what other statistical methods should we use to validate our results? ANOVA? Kruskal Wallis?

Topic		Replies	Views
Do transformers need Cross-Validation Beginners	4	7325	April 1, 2023
Need help with understanding of confidence scores\|HF Transformers Question Answering 🤗Transformers	0	507	December 22, 2021
How do I get a final accuracy for my model if my data is split into train/validation/test Beginners	0	273	February 28, 2023
Statistical significance between BERT models Beginners	0	317	May 31, 2022
Evaluating QA model on single SQuAD file Beginners	1	730	June 7, 2021

Need to verify results of transformer models by running cross-validation or statistical methods? If so, how?

Related topics