What happens if the fine-tuning is done twice?

Apologies in advance if the question is silly, I’m trying to learn about huggingface and nlp in general.

My doubt is the following: let’s suppose that I want to do text-generation and I will work with the gtp2 pre-trained model. First of all I do finetuning with an astronomy dataset. I save the model as gtp2-astronomy. Then, I finetuned gtp2-astronomy with a physics dataset and saved it as a final-model.

My question is: will this final-model be good for text generation of astronomy and also for physics? Or by fine-tuning the second time, do I “eliminate” the ability of the model with astronomy subjects?

I ask this question because, as I understand it, when finetuning you are basically working with the last layer of the network, so, I don’t know if fine-tuning the second time will reset the last layer, which the first time learned about astronomy.

Apologies if the answer is silly, I’ve been using BERT and not GPT2.

I think your twice-trained model would probably remember at least some of the astronomy training, as well as the physics training.

If you had a really huge corpus of physics texts it might overwrite your astronomy training, but I think it is unlikely. Some researchers have shown that many transformers models have a lot more capacity than they need. Also, there is probably overlap of physics and astronomy vocab.

When you fine-tune, you can define whether you want the model layers to be altered or frozen. You could consider gradual unfreezing of layers.


It all depends on how much data you have and on how long you train. If you have billions of texts and train for billions of steps, the astronomy part will hardly be remembered. What you can do, however, is merging them and shuffling the data so that the model evenly sees both datasets at the same time.

PS it is not necessarily the case that only the last layer is finetuned. This is something that you as a user can decide.