I’m deploying a model on Hugging Face Inference Endpoints, but I’m encountering issues when trying to use it for chat-completion. The current setup seems to default to text-generation, even though I want to use it in a conversational chat format (where inputs have a "role": "user"
and similar conversational structures).
I received the following error:
Invalid inference output: Expected ChatCompletionOutput. Use the ‘request’ method with the same parameters to do a custom call with no type checking.
After inspecting the model card metadata, I realized the pipeline tag is either missing or set to the wrong task (like text-generation
), while my input/output is structured more like a chat model.
Configuration:
- Model:
Gragroo/Solenai-v-0-2-1
- Endpoint Task: Custom
- Current Tags:
text-generation
,autotrain
,peft
- Widget Example: Structured for a conversation with role-based inputs (e.g.,
{"role": "user", "content": "What is your favorite condiment?"}
).
Steps to Resolve:
- Update Pipeline Tag: Modify the model card metadata to set the
pipeline_tag
tochat-completion
to ensure the endpoint knows it’s handling a conversation task, not simple text generation.
yaml
pipeline_tag: chat-completion
Test the Model**: Ensure the inputs are structured correctly (e.g., role: "user"
) and test using the updated endpoint.
Is updating the pipeline tag sufficient to switch the model to handle conversational inputs, or are there additional configuration steps I need to follow? What would be the best way to ensure my model processes chat completions correctly?
Looking forward to your suggestions!
Example Input/Output:
- Input:
json
[
{
"role": "user",
"content": "What is the capital of France?"
}
]
- Expected Output:
json
{
"choices": [
{
"delta": {
"content": "The capital of France is Paris."
}
}
]
}
Thanks for your help!
Here’s a side to side exemple LEsf t my model Right llama 3-1-8b instruct
abd the logs (nothing to see there