Why does the falcon QLoRA tutorial code use eos_token as pad_token?

brando · August 21, 2023, 10:47pm

I’m still confused

"
if a model does not have a padding token already (which is common for decoder-only models because they are trained on blocks which do not have any padding). So you never “unlearn” anything.
"

is true, but then during training eos and pad will be masked. So there is a “wrong” distribution shift for generating EOS now. How to fix this? See details above.

Topic		Replies	Views
Mistral trouble when fine-tuning : Don't set pad_token_id = eos_token_id 🤗Transformers	8	5966	August 28, 2024
GPT2 finetuned with eos token will never yield eos token during generation Beginners	6	3385	April 12, 2024
Transformers v3.0.0 is out! 🤗Transformers	0	1941	July 7, 2020
Labels in language modeling: which tokens to set to -100? Beginners	1	3481	November 30, 2020
Issue with finetuning a seq-to-seq model 🤗Transformers	30	3964	August 11, 2022

Why does the falcon QLoRA tutorial code use eos_token as pad_token?

Related topics