meta-llama/Llama-2-7b-chat-hf not generate response when prompt is long

Okay, finally I found a solution by asking the llama2 github group (see the issue Huggingface meta-llama/Llama-2-7b-chat-hf model not generate response when prompt is long · Issue #219 · facebookresearch/llama-recipes · GitHub)

To use llama2 chat model, the prompt needs to follow a specific format, which includes the INST and <> tags, BOS and EOS tokens, etc. The format_tokens() function in llama-recipes (https://github.com/facebookresearch/llama-recipes/blob/main/examples/chat_completion/chat_completion.py#L83) shows us how to do the formatting.