generation_config.json for the Llama-2-hf models explicitly set
top_p=0.6, (e.g. https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/main/generation_config.json). I am wondering why this is the case. In my experience,
top_p=0.6 easily leads to very repetitive text. Wouldn’t it be better to just not include these in the
These parameter were specifically set by the Llama team, and probably come from their experiments with the model!
Thanks for the response! I guess you probably won’t know the answer to this, but is it possible that the temperature and top_p were somehow swapped on accident? In the official Llama repo, it seems that a temperature of 0.6 and top_p of 0.9 are used, (https://github.com/facebookresearch/llama/blob/main/example_text_completion.py, https://github.com/facebookresearch/llama/blob/main/example_chat_completion.py). Whereas the
generation_config.json has them swapped, temperature of 0.9 and top_p of 0.6.
Ah in that case I would need to look at the commits, they might have been swapped ! Thanks for noticing!