Using the same dataset for fine-tuning and training

NajlaKSA · April 28, 2022, 4:42pm

Hi, I’m working on a project that utilizes BERT model.

initially, I have split the dataset 80/20 training/testing.
then I used the training split for fine-tuning the BERT model using Trainer, and testing was passed for validation.
after that, I have extracted the BERT embeddings, and stacked it on a TF model, then compiled it, then trained it using the same portion of data used for the fine-tuning

my question is:
is it OK to use the same data portion for fine-tuning AND training?
would that cause over-fitting?

I searched for similar posts but I couldn’t find any answer,
thanks in advance.

BramVanroy · April 29, 2022, 7:40am

As long as you have one held-out set that is never used for during-training-evaluation (only as a final test set), that’s okay. It’s not different from training for multiple epochs.

NajlaKSA · May 7, 2022, 2:02pm

Perfect!, Thanks a lot.

Topic		Replies	Views
Overlapping data between pre-training and fine-tuning stages 🤗Transformers	0	259	October 8, 2021
Weird losses while fine tuning Beginners	0	347	September 17, 2021
Does the model learn from the eval_dataset when fine-tuning? Beginners	2	288	May 4, 2022
Overfitting in BERT IMDB50k 🤗Transformers	0	1111	June 3, 2021
Does order of training data matter when fine-tuning a BERT or RoBERTa model? Beginners	0	450	August 31, 2022

Using the same dataset for fine-tuning and training

Related topics