Suspicious Metrics on finetuning pipelines using same pretrained models

Hello! I am working on finetuning a custom NER model. I am experimenting with 4 different transformer models such as DistilBERT, XLMRoberta, GLiNER etc on a custom annotated dataset. I used all these models in 2 pipelines. One is trained with original data and the other is trained with Augmented version of the original train data. So basically, I am trying to experiment for my use case if Augmenting the data improves model performance.

Both these pipelines are carried out in separate Jupyter Notebooks. Except the training data, all details are same in both the notebooks. but what I noticed is that, when I run my second pipeline with the Augmented data, the model seems to have suspicious metrics right from the first epoch. Moreover, it overfits in just a few epochs. This was the case when I tried with all the models I mentioned above.

So, I wanted to reach out and ask if I am missing any crucial step in my training pipelines. My doubt is if the pipeline uses new instance of the pretrained model every time I call it. Is it possible for the second pipeline to use the model that I used from first pipeline? Maybe that is why I get suspicious metrics?

Please let me know if there is any good practice to be followed in these situations. Also if there are dedicated ways to run training on a pretrained model using 2 different training datasets for comparison?

1 Like

Both these pipelines are carried out in separate Jupyter Notebooks.

If multiple notebooks are running simultaneously on the same PC outside of a virtual environment, interference may occur.
To avoid this, you can set os.environ["TRANSFORMERS_CACHE"] to a unique directory for each notebook.

My doubt is if the pipeline uses new instance of the pretrained model every time I call it.

It is possible that you are calling from_pretrained every time, or that you forgot to declare global variables in the local scope even though they are global variables.

Reusing models between different pipelines is possible in code, but I don’t recommend it because memory management and other issues may arise.

Even so, I’m not sure if I can explain the cause of overfitting…
The most common cause seems to be insufficient data, but I wonder if it happens in the first epoch…