How to actually use padding in Lllama Tokenizers

vikalex · May 9, 2023, 4:21am

I have been struggling to get padding to work properly with llama based models.

It seems like llama by default does not use a pad token. Does this mean that you simply can’t have batch_size > 1 ?

But some suggestions on github include to set pad_token = eos_token. But the issue with that is that pad_token_id is actually set in the generation_config generation_config.json · lmsys/vicuna-13b-delta-v1.1 at main

pad_token_id is set to 0. Why is there an inconsistency? Does this matter should I also set pad_token_id to eos_token_id?

Currently I simply set pad_token = eos_token, but keep the pad_token_id as 0 and am noticing poorer performance with batch inference as compared to single inference.

Hachem1 · June 11, 2023, 3:23am

@vikalex how did you manage to solve this issue ? help please !

vikalex · June 16, 2023, 5:46pm

I just set eos_token to pad_token and it doesn’t cause any issues as long as the model knows both of them and has a good stopping criteria.

Topic		Replies	Views
Padding Token Missing from LLaMA Models	1	184	April 17, 2025
How to set the Pad Token for meta-llama/Llama-3 Models Models	6	11934	August 29, 2024
Llama2 pad token for batched inference Models	7	15594	March 31, 2024
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation Models	5	3836	October 16, 2024
LLama pad token Beginners	3	2038	February 18, 2025

How to actually use padding in Lllama Tokenizers

Related topics