I have started developing LLM style models, and honestly, things were going well, and had this one working a couple of weeks ago and my friends tried it successfully.
For some reason, now I can neither use my space or the inference provider, getting the following error “Server amusktweewt/tiny-model-500M-chat-v2 does not seem to support chat completion. Error: Model amusktweewt/tiny-model-500M-chat-v2 does not exist”.
I don’t know what happens because I changed nothing, literally the repo is frozen around a month ago and during that time it worked well, the model also works fine locally with a pipeline.
HF_TOKEN = "hf_my_valid_pro_token"
#HF_TOKEN = False # if use it, fails with 503 error
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=HF_TOKEN
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chat.completions.create(
model="amusktweewt/tiny-model-500M-chat-v2",
messages=messages,
max_tokens=500,
)
print(completion.choices[0].message)
# ChatCompletionOutputMessage(role='assistant', content='OUP for France - reduced price comparison board (BUFF) is the payoff for carbon emissions.', tool_calls=None)