Hi,
I finetune the smallest version of gpt2 (distilgpt2) trained on a dataset. The dataset consists only of texts and after some texts, an EOS token is inserted.
Training is running decently, the loss is constantly decreasing.
But using model.generate(input_ids, …) no matter what the model will always output tokens till the max_length has been reached.
I think that the probability for EOS token has not been adjusted in the model well enough.
Any tips to improve it or make the model generate EOS after some texts?
1 Like
I have encountered the same problem. Have you found any solutions?
Finally, I found the reason.
The DataCollatorForLanguageModeling always masks the pad_token in the labels and I set the pad_token = eos_token.
1 Like
Hi, I am encountering the same problem. How did you resolve this? this you change the pad_token to something else?
I have the same problem, the model does not shut up…
I believe the most elegant solution may be to switch to using the Seq2Seq DataCollator as described here, otherwise you can introduce a new padding token.
I faced the same problem.
The fine-tuning of Gemma 2 works well according to the loss functions.
But after training the prediction was just eos eos.
The solution in my case was simple:
Set eos_token to False
model = AutoModelForCausalLM.from_pretrained(model_id,
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=False)