Hello! All of a sudden, the 70b chat llama model started to return incomplete text results. Even using the same exact input, the result returned is an incomplete version of the correct one.
Example:
curl --request POST \
--url https://api-inference.huggingface.co/models/meta-llama/Llama-2-70b-chat-hf \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{"inputs": "Please write a long, very long poem about mars and cats"}'
Result:
[
{
"generated_text": "Please write a long, very long poem about mars and cats.\n\nMars, the red planet, a world so vast,\nA place where no"
}
]
Has something changed in the last few days?