GPT2 finetuned with eos token will never yield eos token during generation

Hi,

I finetune the smallest version of gpt2 (distilgpt2) trained on a dataset. The dataset consists only of texts and after some texts, an EOS token is inserted.
Training is running decently, the loss is constantly decreasing.
But using model.generate(input_ids, …) no matter what the model will always output tokens till the max_length has been reached.

I think that the probability for EOS token has not been adjusted in the model well enough.
Any tips to improve it or make the model generate EOS after some texts?

1 Like

I have encountered the same problem. Have you found any solutions?

Finally, I found the reason.

The DataCollatorForLanguageModeling always masks the pad_token in the labels and I set the pad_token = eos_token.

1 Like

Hi, I am encountering the same problem. How did you resolve this? this you change the pad_token to something else?

I have the same problem, the model does not shut up…

I believe the most elegant solution may be to switch to using the Seq2Seq DataCollator as described here, otherwise you can introduce a new padding token.

I faced the same problem.
The fine-tuning of Gemma 2 works well according to the loss functions.
But after training the prediction was just eos eos.

The solution in my case was simple:
Set eos_token to False
model = AutoModelForCausalLM.from_pretrained(model_id,
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=False)