Why does the falcon QLoRA tutorial code use eos_token as pad_token?

Rocketknight1 · August 22, 2023, 12:07pm

Hi all! There’s an interesting story here.

In general you are correct that causal LMs like Falcon are not trained with a pad token, and so the tokenizer does not have one set. This is true for a lot of causal LMs in the Hub. During training, these models are often fed sequences that have been concatenated together and truncated at the maximum sequence length, and so there is never any empty space that needs padding.

The reason we add one later is because a lot of downstream methods use padding and attention masks in some way. However, in many cases it doesn’t really matter what you set the padding token to! This is because the padded tokens will generally be masked by setting the attention_mask to 0, so those tokens will not be attended to by the rest of the sequence.

However, one place the choice of padding token can matter is in the labels when fine-tuning the model. This is because in standard CLM training, the labels are the inputs, shifted by a single position. This would mean that in the final position of the sequence before the padding at the end, the label at that position will be the padding token. When training models with shorter sequences (such as for chat), we generally want them to mark the end of the text they’ve generated, using a token like eos_token. As a result, we commonly just use eos_token as the padding token.

However, depending on your fine-tuning task, you may not want the model to learn to predict eos_token at the end of a sequence - if this is the case, simply change the label at that position to the token you do want, or set the label to -100 to mask the label at that position.

Does that answer the questions you had? Feel free to let me know if I missed anything here!

Topic		Replies	Views
Mistral trouble when fine-tuning : Don't set pad_token_id = eos_token_id 🤗Transformers	8	5946	August 28, 2024
GPT2 finetuned with eos token will never yield eos token during generation Beginners	6	3384	April 12, 2024
How does GPT decide to stop generating sentences without EOS token? 🤗Transformers	13	24684	August 19, 2024
I want fine tune my LLM (falcon-7b) to learn to stop : Which strategy? Beginners	0	1203	August 9, 2023
Why does hugging face falcon model use mode.config.use_cache = False, why wouldn't it want to have the decoder re-use computations for fine-tuning? Models	7	2881	July 19, 2023

Why does the falcon QLoRA tutorial code use eos_token as pad_token?

Related topics