Why does the falcon QLoRA tutorial code use eos_token as pad_token?

I’m still confused

"
if a model does not have a padding token already (which is common for decoder-only models because they are trained on blocks which do not have any padding). So you never “unlearn” anything.
"

is true, but then during training eos and pad will be masked. So there is a “wrong” distribution shift for generating EOS now. How to fix this? See details above.

1 Like