I finetune the smallest version of gpt2 (distilgpt2) trained on a dataset. The dataset consists only of texts and after some texts, an EOS token is inserted.
Training is running decently, the loss is constantly decreasing.
But using model.generate(input_ids, …) no matter what the model will always output tokens till the max_length has been reached.
I think that the probability for EOS token has not been adjusted in the model well enough.
Any tips to improve it or make the model generate EOS after some texts?