How to actually use padding in Lllama Tokenizers

I have been struggling to get padding to work properly with llama based models.

It seems like llama by default does not use a pad token. Does this mean that you simply can’t have batch_size > 1 ?

But some suggestions on github include to set pad_token = eos_token. But the issue with that is that pad_token_id is actually set in the generation_config generation_config.json · lmsys/vicuna-13b-delta-v1.1 at main

pad_token_id is set to 0. Why is there an inconsistency? Does this matter should I also set pad_token_id to eos_token_id?

Currently I simply set pad_token = eos_token, but keep the pad_token_id as 0 and am noticing poorer performance with batch inference as compared to single inference.

2 Likes

@vikalex how did you manage to solve this issue ? help please !

I just set eos_token to pad_token and it doesn’t cause any issues as long as the model knows both of them and has a good stopping criteria.