Guide/Tutorial to write an inference endpoint for custom models

Are there any docs/guides/tutorials to write custom endpoints to be hosted on Huggingface Hub’s inference endpoints?

We’re particularly looking at these models which does things a little differently from normal Huggingface AutoModels (that’ll need A100):

And potentially also these that runs on T4:

1 Like

Hello, have you found any guides that were useful to you? We need to write a handler.py file for the mistral7b model that we finetuned using unsloth to deploy on an inference endpoint.

Hi @Saran12 Have you found any solution for writing handler.py for unsloth finetuned mistral 7b? I am facing same issue.

1 Like

Hi @DisgustingOzil, we had the error due to incorrect procedure, instead of merging the adapter weights with the base model, we just uploaded the adapter weights directly and tried to deploy that instead.

After reading the following , Config.json is not saving after finetuning Llama 2 - #5 by hemanthkumar23, by using the merge_and_unload() function, we were able to upload the complete model that was deployable. Unfortunately I don’t know how to reduce the size of the merged model. Hope this helps!

Hi,

This blog post shows how to deploy any custom model: Custom Inference with Hugging Face Inference Endpoints.

2 Likes