Finetuning in multiple sequential training sessions rather than at once

adityashukzy · April 12, 2023, 8:19am

Hi, community!

My goal is to fine-tune an LLM suited to summarization on a dataset of technical texts. This will give a summarizer fit for academic summaries.

I have been attempting to fine-tune facebook/bart-base on the arxiv-summarization dataset (both on HF) on a Kaggle kernel with a P100 GPU.

I have noted that training for more than 4-5 epochs at a time is not possible due to the kernel timing out, and the fine-tuning does not render great improvements either.

A strategy I came up with is to train for 3 epochs at a time, and to repeat this process 5 times. Each time, we start with the model fine-tuned till the previous session and hence achieve an effective training of 15 epochs.

I have some queries regarding this approach which would benefit from my peers’ thoughts:

Is this procedure sensible/viable?
Should the hyperparameters stay the same across all sessions?
What about the learning rate? Should each session start with the usual starting value, or should we account for the decay across sessions?
Should each session use the same training data (about 50k samples)? Or should each session have different and mutually-exclusive training data (say 5 sessions of 25k samples each) to maximize diversity of learnings?
Should the validation data be the same for each session? If so, we can accurately track any improvement in the ROUGE scores across sessions.

I realize I have asked several questions, but thoughts on any would be appreciated.

Thank you!

capnchat · December 26, 2023, 7:36pm

@adityashukzy were you ever able to successfully do this? I am having an issue with training a model that I have trained and saved successfully once before. On the second training attempt, I am getting this error:

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

Any help would be very appreciated.

Topic		Replies	Views
Multi GPU fintuning BART 🤗Transformers	3	1656	July 11, 2020
Error when finetuning pretrained huggingface conv-ai chatbot model 🤗Transformers	2	818	April 19, 2021
Continuous training on Fine-tuned Model 🤗Transformers	7	4332	January 28, 2022
T5 Fine-Tuning for summarization with multiple GPUs Intermediate	0	848	June 28, 2022
Resume Training / Finetune a language model and further finetune a classifier Research	1	1287	October 19, 2020

Finetuning in multiple sequential training sessions rather than at once

Related topics