LLama pad token

I notice that the llama tokenizer does not have a pad token and I tried to add the pad token manually. This is what I have done.

tokenizer.add_special_tokens({"pad_token":"<pad>"}) model.resize_token_embeddings(len(tokenizer)) model.model.embed_tokens.padding_idx =32000
Is this the correct way of adding pad tokens?
After adding the pad token in my tokenizer, my finetuned llama model stop generation with </s><s>.
Is this behaviour expected? I thought the the model should stop with an eos token only, but I dont understand why it stops with an eos token and bos token

2 Likes

currently experiencing the same error, any updates on a solution?

1 Like

It’s not really a bug, but it seems to be an unsolved issue.

1 Like

ValueError: Cannot handle batch sizes > 1 if no padding token is defined., the error persists despite running ,
tokenizer.pad_token = tokenizer.eos_token

model.config.pad_token_id = tokenizer.pad_token_id

1 Like