Increasing Perplexity when fine-tuning GPT-2


I have a question regarding evaluation when fine-tuning GPT-2. My task is training GPT-2 to write a short story. The input file consists of a short-set of stories each with the following structure:

t Title kw outline b body

For example: _t_Harry Potter _kw_Harry goes to Hogwards b Story

My goal is to give GPT-2 the title and outline as prompt and have it generate the body.

I have added all three special tokens to the model (t,kw,b) but the more I train, the bigger perplexity gets (Ironically, the generations improves from a human point of view).

Did I miss something? Should I change the evaluation ? (So that it gets the title+outline and evaluates the generated body) If so, where exactly would I need to look to change it?

Thanks for the help!

1 Like