LLama pad token

I notice that the llama tokenizer does not have a pad token and I tried to add the pad token manually. This is what I have done.

tokenizer.add_special_tokens({"pad_token":"<pad>"}) model.resize_token_embeddings(len(tokenizer)) model.model.embed_tokens.padding_idx =32000
Is this the correct way of adding pad tokens?
After adding the pad token in my tokenizer, my finetuned llama model stop generation with </s><s>.
Is this behaviour expected? I thought the the model should stop with an eos token only, but I dont understand why it stops with an eos token and bos token

1 Like