Padding Token Missing from LLaMA

Hi all,
I’m attempting to train a Llama model using a custom JSON file and I cannot get past an error which reads:

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

I’ve tried both suggested codes and come up with the same error. I’ve also looked around online and it appears the problem is that Llama does not have a built in PAD and refuses to acknowledge one in the bos_token or eos_token, but there does not seem to be a fix for this. Does anyone know of a workaround or code that works to solve this?

1 Like

A compromise seems possible, but it’s unclear whether it’s the right solution…