Requirements for Hosting LLM via Inference Endpoints

Overview

I’m trying to deploy a fine-tuned LLM via Hugging Face Inference Endpoints. However following the instructions here when I select my model I get the message

Warning: deploying this model will probably fail because no

"handler.py"

file was found in the repository. Try selecting a different model or creating a custom handler.

Question

What do I need to include in the model repo for Hugging Face to recognize the model as one it can serve itself?

Details

The model is fine-tuned from Llama 3.1 8B and from what I can tell Hugging Face can detect and run other LLama3.1 varients without a custom handler. For example arcee-ai/Llama-3.1-SuperNova-Lite doesn’t give me the same error in the endpoint UI.

1 Like

Since Inference Endpoint is a pay-as-you-go service, I think it would be safer to consult Expert Support

In any case, there is also the option of writing handler.py, but if you already have a model repository that can be used with Transformers, setting the TRUST_REMOTE_CODE environment variable should make it work.

Hi @cbolles! A custom handler may be required for custom tasks, including custom pre- & post-processing. We have additional details on creating and adding a custom handler to your model to use with Inference Endpoints here: Create custom Inference Handler.

I wanted to mention our Inference Endpoints catalog of ready-to-deploy models that require no additional customization, and deployment is verified by Hugging Face: Inference Catalog | Inference Endpoints by Hugging Face.

Hope this helps and let us know if you have other questions!

1 Like