Llama 70b returning incomplete responses

Lucisu · November 7, 2023, 4:33am

Hello! All of a sudden, the 70b chat llama model started to return incomplete text results. Even using the same exact input, the result returned is an incomplete version of the correct one.

Example:

curl --request POST \
  --url https://api-inference.huggingface.co/models/meta-llama/Llama-2-70b-chat-hf \
  --header 'Authorization: Bearer API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{"inputs": "Please write a long, very long poem about mars and cats"}'

Result:

[
	{
		"generated_text": "Please write a long, very long poem about mars and cats.\n\nMars, the red planet, a world so vast,\nA place where no"
	}
]

Has something changed in the last few days?

Topic		Replies	Views
Llama3 Response Models	2	708	April 29, 2024
LLAMA2 70b Inference api stuck on currently loading Inference Endpoints on the Hub	4	1036	September 3, 2024
Llama-2-7b-chat fine-tuning Models	4	6768	April 26, 2024
Truncated un-finished response after deploying hugging-face models Amazon SageMaker	0	377	January 19, 2024
Llama 3 70b in the Chat UI Is Super Slow and Nearly Unusable Beginners	2	697	October 4, 2024

Llama 70b returning incomplete responses

Related topics