How to set the Pad Token for meta-llama/Llama-3 Models

nbroad · August 23, 2024, 8:51pm

You could try asking the model authors in a discussion on the model page or on their github, but I doubt you would get a response.

The short answer is that the padding tokens are not that important as long as you are consistent. Moreover, if you use flash attention 2, the padding tokens will be deleted entirely, so they don’t matter at all.

If you aren’t using flash attention 2, you should be careful about the padding tokens because some data collators will mask out padding tokens from the loss. If you set the padding token to be the same as the eos token, then the model will never learn when to stop because the stop token will not be included in the loss.

If you use a model in TGI or vLLM, the padding tokens don’t matter.

Topic		Replies	Views
Padding Token Missing from LLaMA Models	1	184	April 17, 2025
How to actually use padding in Lllama Tokenizers 🤗Transformers	2	4924	June 16, 2023
Can't set pad_token by adding special token to Llama's tokenizer 🤗Transformers	4	5847	August 12, 2024
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation Models	5	3836	October 16, 2024
Llama2 pad token for batched inference Models	7	15594	March 31, 2024

How to set the Pad Token for meta-llama/Llama-3 Models

Related topics