Do transformers need Cross-Validation

Hello, I am training a model and according to the standard documentation, I split the data into training and validation and pass those on to Trainer(), where I calculate various metrics.
In previous ML projects I used to do K-fold validation. I have not found any examples of people doing this with Transformers and was wondering is there a reason for that? Will using K-fold not improve results?

hi @theudster, i believe the main reasons you don’t see cross-validation with transformers (or deep learning more generally) is because a) transfer learning is quite effective and b) most public applications involve large enough datasets where the effect of picking a bad train/test split is diminished.

having said that, nothing stops you from using cross-validation with transformers and it’s probably useful if you have a smallish number of labelled samples for fine-tuning. i’m also sure it’s heavily used in kaggle comps to get good results :slight_smile:


how can we use it?
I am using Trainer class provided by the huggingface for multi-class classification (with bert-base-uncased), and can’t seem to figure out how to provide k-fold validation dataset.
Could you please provide some example.

As an FYI, it is also quite common in (niche) NLP research where we often have to create our own labeled datasets manually, resulting in rather small datasets. But as you say it is not always “worth” the time and effort because transfer learning already works quite well. You may be better off simply doing something like gradually unfreezing layers or explicitly over-fitting (depending on your use-case!).

1 Like