I have a question regarding evaluation when fine-tuning GPT-2. My task is training GPT-2 to write a short story. The input file consists of a short-set of stories each with the following structure:
t Title kw outline b body
For example: _t_Harry Potter _kw_Harry goes to Hogwards b Story
My goal is to give GPT-2 the title and outline as prompt and have it generate the body.
I have added all three special tokens to the model (t,kw,b) but the more I train, the bigger perplexity gets (Ironically, the generations improves from a human point of view).
Did I miss something? Should I change the evaluation ? (So that it gets the title+outline and evaluates the generated body) If so, where exactly would I need to look to change it?
Thanks for the help!