GPT2 finetuned with eos token will never yield eos token during generation


I finetune the smallest version of gpt2 (distilgpt2) trained on a dataset. The dataset consists only of texts and after some texts, an EOS token is inserted.
Training is running decently, the loss is constantly decreasing.
But using model.generate(input_ids, …) no matter what the model will always output tokens till the max_length has been reached.

I think that the probability for EOS token has not been adjusted in the model well enough.
Any tips to improve it or make the model generate EOS after some texts?

1 Like

I have encountered the same problem. Have you found any solutions?

Finally, I found the reason.

The DataCollatorForLanguageModeling always masks the pad_token in the labels and I set the pad_token = eos_token.