Overview
I’m trying to deploy a fine-tuned LLM via Hugging Face Inference Endpoints. However following the instructions here when I select my model I get the message
Warning: deploying this model will probably fail because no
"handler.py"
file was found in the repository. Try selecting a different model or creating a custom handler.
Question
What do I need to include in the model repo for Hugging Face to recognize the model as one it can serve itself?
Details
The model is fine-tuned from Llama 3.1 8B and from what I can tell Hugging Face can detect and run other LLama3.1 varients without a custom handler. For example arcee-ai/Llama-3.1-SuperNova-Lite
doesn’t give me the same error in the endpoint UI.
1 Like
Since Inference Endpoint is a pay-as-you-go service, I think it would be safer to consult Expert Support…
In any case, there is also the option of writing handler.py
, but if you already have a model repository that can be used with Transformers, setting the TRUST_REMOTE_CODE
environment variable should make it work.
Hi @cbolles! A custom handler may be required for custom tasks, including custom pre- & post-processing. We have additional details on creating and adding a custom handler to your model to use with Inference Endpoints here: Create custom Inference Handler.
I wanted to mention our Inference Endpoints catalog of ready-to-deploy models that require no additional customization, and deployment is verified by Hugging Face: Inference Catalog | Inference Endpoints by Hugging Face.
Hope this helps and let us know if you have other questions!
1 Like