Suspicious Metrics on finetuning pipelines using same pretrained models

joeytest · April 25, 2025, 4:44pm

Hello! I am working on finetuning a custom NER model. I am experimenting with 4 different transformer models such as DistilBERT, XLMRoberta, GLiNER etc on a custom annotated dataset. I used all these models in 2 pipelines. One is trained with original data and the other is trained with Augmented version of the original train data. So basically, I am trying to experiment for my use case if Augmenting the data improves model performance.

Both these pipelines are carried out in separate Jupyter Notebooks. Except the training data, all details are same in both the notebooks. but what I noticed is that, when I run my second pipeline with the Augmented data, the model seems to have suspicious metrics right from the first epoch. Moreover, it overfits in just a few epochs. This was the case when I tried with all the models I mentioned above.

So, I wanted to reach out and ask if I am missing any crucial step in my training pipelines. My doubt is if the pipeline uses new instance of the pretrained model every time I call it. Is it possible for the second pipeline to use the model that I used from first pipeline? Maybe that is why I get suspicious metrics?

Please let me know if there is any good practice to be followed in these situations. Also if there are dedicated ways to run training on a pretrained model using 2 different training datasets for comparison?

John6666 · April 26, 2025, 5:20am

Both these pipelines are carried out in separate Jupyter Notebooks.

If multiple notebooks are running simultaneously on the same PC outside of a virtual environment, interference may occur.
To avoid this, you can set os.environ["TRANSFORMERS_CACHE"] to a unique directory for each notebook.

My doubt is if the pipeline uses new instance of the pretrained model every time I call it.

It is possible that you are calling from_pretrained every time, or that you forgot to declare global variables in the local scope even though they are global variables.

Reusing models between different pipelines is possible in code, but I don’t recommend it because memory management and other issues may arise.

Even so, I’m not sure if I can explain the cause of overfitting…
The most common cause seems to be insufficient data, but I wonder if it happens in the first epoch…

github.com/huggingface/transformers

src/transformers/pipelines/base.py

main


      
              args_parser: ArgumentHandler = None,
              device: Union[int, "torch.device"] = None,
              torch_dtype: Optional[Union[str, "torch.dtype"]] = None,
              binary_output: bool = False,
              **kwargs,
          ):
              if framework is None:
                  framework, model = infer_framework_load_model(model, config=model.config)
          
              self.task = task
              self.model = model
              self.tokenizer = tokenizer
              self.feature_extractor = feature_extractor
              self.image_processor = image_processor
              self.processor = processor
              self.modelcard = modelcard
              self.framework = framework
          
              # `accelerate` device map
              hf_device_map = getattr(self.model, "hf_device_map", None)

Topic		Replies	Views
Fundamental newbie questions Beginners	1	1335	December 6, 2020
Different outputs when using pipeline Intermediate	2	1232	July 20, 2023
Same metrics after every epoch Beginners	4	358	May 30, 2024
Finetune model outputs diffrent predictions at each run ? why? 🤗Transformers	0	369	December 15, 2021
Using the same dataset for fine-tuning and training Beginners	2	1528	May 7, 2022

Suspicious Metrics on finetuning pipelines using same pretrained models

Related topics