Guide/Tutorial to write an inference endpoint for custom models

alvations · March 21, 2024, 6:00pm

Are there any docs/guides/tutorials to write custom endpoints to be hosted on Huggingface Hub’s inference endpoints?

We’re particularly looking at these models which does things a little differently from normal Huggingface AutoModels (that’ll need A100):

And potentially also these that runs on T4:

GitHub - unslothai/unsloth: 2-5X faster 70% less memory QLoRA & LoRA finetuning

Saran12 · May 16, 2024, 5:27am

Hello, have you found any guides that were useful to you? We need to write a handler.py file for the mistral7b model that we finetuned using unsloth to deploy on an inference endpoint.

DisgustingOzil · June 11, 2024, 3:49am

Hi @Saran12 Have you found any solution for writing handler.py for unsloth finetuned mistral 7b? I am facing same issue.

Saran12 · June 11, 2024, 5:51am

Hi @DisgustingOzil, we had the error due to incorrect procedure, instead of merging the adapter weights with the base model, we just uploaded the adapter weights directly and tried to deploy that instead.

After reading the following , Config.json is not saving after finetuning Llama 2 - #5 by hemanthkumar23, by using the merge_and_unload() function, we were able to upload the complete model that was deployable. Unfortunately I don’t know how to reduce the size of the merged model. Hope this helps!

nielsr · June 11, 2024, 7:00am

Hi,

This blog post shows how to deploy any custom model: Custom Inference with Hugging Face Inference Endpoints.

codewithRiz · October 19, 2024, 6:54am

this help me to create some custom handler.py file for my project whis is not in pure transformer

Topic		Replies	Views
Requirements for Hosting LLM via Inference Endpoints Inference Endpoints on the Hub	2	40	June 13, 2025
Help with custom handler.py for model inference endpoint Beginners	1	732	February 24, 2024
Custom handler with gated model Inference Endpoints on the Hub	5	823	January 25, 2024
Creating inference endpoint with custom handler - is this how it should work? Beginners	5	2311	November 27, 2022
Deploying to Model Hub for Inference with custom tokenizer Beginners	1	624	January 1, 2022

Guide/Tutorial to write an inference endpoint for custom models

Are there any docs/guides/tutorials to write custom endpoints to be hosted on Huggingface Hub’s inference endpoints?

Related topics