When is a generative model said to overfit?

BramVanroy · October 4, 2021, 9:22am

If I train a causal language model, should I be worried about overfitting? If so, what would that imply? That it cannot generalize well to unseen prompts?

I am used to validating on downstream tasks and selecting the best checkpoint there where validation loss is not worse than training loss (overfitting), but I am not sure if that applies to CLM/generation tasks.

I guess what I am asking is:

do you validate your (C)LM/generation tasks during training as a means to do early stopping/finding the best checkpoint?
if you do not, how do you decide how long to train?

nielsr · October 5, 2021, 7:28am

For generative models, one typically measures the perplexity on a held-out dataset. As long as perplexity keeps improving, keep training.

BramVanroy · October 5, 2021, 7:50am

So similar to how you would otherwise track overfitting, only with PPL as a metric. I find that my eval loss is increasing quite rapidly when finetuning, but that still generations are not really sensible nor adapted to the new domain. If you have any ideas, shoot!

eggie5 · June 14, 2023, 7:28pm

I have same problem. any updates?

Topic		Replies	Views
Using BERT and RoBERTa for (causal?) language modeling 🤗Transformers	6	5336	October 2, 2021
PEGASUS model overfitting Research	2	464	May 19, 2021
Fine-tune MT5ConditionalGeneration for question generation Intermediate	0	487	January 4, 2022
Generation / Inference Models	0	252	December 11, 2023
Fine-tuning GPT2 Family (Small to XL), How should hyperparameters and generation criteria change? Models	0	1150	May 30, 2023

When is a generative model said to overfit?

Related topics