Requirements for Hosting LLM via Inference Endpoints

cbolles · June 13, 2025, 2:18pm

Overview

I’m trying to deploy a fine-tuned LLM via Hugging Face Inference Endpoints. However following the instructions here when I select my model I get the message

Warning: deploying this model will probably fail because no

"handler.py"

file was found in the repository. Try selecting a different model or creating a custom handler.

Question

What do I need to include in the model repo for Hugging Face to recognize the model as one it can serve itself?

Details

The model is fine-tuned from Llama 3.1 8B and from what I can tell Hugging Face can detect and run other LLama3.1 varients without a custom handler. For example arcee-ai/Llama-3.1-SuperNova-Lite doesn’t give me the same error in the endpoint UI.

John6666 · June 13, 2025, 2:39pm

Since Inference Endpoint is a pay-as-you-go service, I think it would be safer to consult Expert Support…

In any case, there is also the option of writing handler.py, but if you already have a model repository that can be used with Transformers, setting the TRUST_REMOTE_CODE environment variable should make it work.

meganariley · June 13, 2025, 3:44pm

Hi @cbolles! A custom handler may be required for custom tasks, including custom pre- & post-processing. We have additional details on creating and adding a custom handler to your model to use with Inference Endpoints here: Create custom Inference Handler.

I wanted to mention our Inference Endpoints catalog of ready-to-deploy models that require no additional customization, and deployment is verified by Hugging Face: Inference Catalog | Inference Endpoints by Hugging Face.

Hope this helps and let us know if you have other questions!

Topic		Replies	Views
Help with custom handler.py for model inference endpoint Beginners	1	785	February 24, 2024
Creating inference endpoint with custom handler - is this how it should work? Beginners	5	2361	November 27, 2022
Guide/Tutorial to write an inference endpoint for custom models Inference Endpoints on the Hub	5	1946	October 19, 2024
Handler.py not executed in Inference Endpoint Inference Endpoints on the Hub	0	279	September 13, 2023
Model won't load on custom inference endpoint Inference Endpoints on the Hub	2	377	June 13, 2024

Requirements for Hosting LLM via Inference Endpoints

Overview

Question

Details

Related topics