Guide/Tutorial to write an inference endpoint for custom models

Saran12 · June 11, 2024, 5:51am

Hi @DisgustingOzil, we had the error due to incorrect procedure, instead of merging the adapter weights with the base model, we just uploaded the adapter weights directly and tried to deploy that instead.

After reading the following , Config.json is not saving after finetuning Llama 2 - #5 by hemanthkumar23, by using the merge_and_unload() function, we were able to upload the complete model that was deployable. Unfortunately I don’t know how to reduce the size of the merged model. Hope this helps!

Topic		Replies	Views
Help with custom handler.py for model inference endpoint Beginners	1	739	February 24, 2024
Requirements for Hosting LLM via Inference Endpoints Inference Endpoints on the Hub	2	48	June 13, 2025
Model won't load on custom inference endpoint Inference Endpoints on the Hub	2	360	June 13, 2024
Creating inference endpoint with custom handler - is this how it should work? Beginners	5	2317	November 27, 2022
Issue with Deploying LoRA-adapted Model on Hugging Face Endpoint Beginners	10	117	April 26, 2025

Guide/Tutorial to write an inference endpoint for custom models

Related topics