GPT2 finetuned with eos token will never yield eos token during generation

darkraipro · March 7, 2022, 10:30am

Hi,

I finetune the smallest version of gpt2 (distilgpt2) trained on a dataset. The dataset consists only of texts and after some texts, an EOS token is inserted.
Training is running decently, the loss is constantly decreasing.
But using model.generate(input_ids, …) no matter what the model will always output tokens till the max_length has been reached.

I think that the probability for EOS token has not been adjusted in the model well enough.
Any tips to improve it or make the model generate EOS after some texts?

zhensuuu · May 31, 2022, 7:14am

I have encountered the same problem. Have you found any solutions?

zhensuuu · June 2, 2022, 1:05pm

Finally, I found the reason.

The DataCollatorForLanguageModeling always masks the pad_token in the labels and I set the pad_token = eos_token.

larsdw · May 26, 2023, 9:17am

Hi, I am encountering the same problem. How did you resolve this? this you change the pad_token to something else?

JoaoLages · May 28, 2023, 1:37am

I have the same problem, the model does not shut up…

amilios · August 5, 2023, 2:15am

I believe the most elegant solution may be to switch to using the Seq2Seq DataCollator as described here, otherwise you can introduce a new padding token.

Sam1989 · April 12, 2024, 9:10am

I faced the same problem.
The fine-tuning of Gemma 2 works well according to the loss functions.
But after training the prediction was just eos eos.

The solution in my case was simple:
Set eos_token to False
model = AutoModelForCausalLM.from_pretrained(model_id,
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=False)

Topic		Replies	Views
Mistral trouble when fine-tuning : Don't set pad_token_id = eos_token_id 🤗Transformers	8	5759	August 28, 2024
How does GPT decide to stop generating sentences without EOS token? 🤗Transformers	13	24363	August 19, 2024
Why does the falcon QLoRA tutorial code use eos_token as pad_token? Models	19	7749	January 17, 2024
GPT2 returns sequence of <\|endoftext\|> after finetuning 🤗Transformers	2	248	January 31, 2024
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation Beginners	5	46181	September 24, 2024

GPT2 finetuned with eos token will never yield eos token during generation

Related topics