How to deploy fine-tuned llava model with Huggingface Inference and using vLLM?

hongyiyang · July 15, 2024, 2:52pm

I was going over this article (Deploy open LLMs with vLLM on Hugging Face Inference Endpoints) and it mentions that we need to have a custom container. I’m wondering if that’s a must-have or is it enough to just have custom dependencies in requirements.txt (Add custom Dependencies). Also any examples or instructions of deploying multi-modal models like llava with vLLM would help!

Topic		Replies	Views
Requirements for Hosting LLM via Inference Endpoints Inference Endpoints on the Hub	2	73	June 13, 2025
Serving AWQ models without a custom container Inference Endpoints on the Hub	2	251	November 13, 2023
How to Finetune and deploy LLaVA-1.6 Models	7	9514	June 10, 2024
Deploying my own custom Llama model to production using Hugging Face Beginners	0	831	December 9, 2023
Deploying custom inference script with llama2 finetuned model Amazon SageMaker	6	1272	January 4, 2024

How to deploy fine-tuned llava model with Huggingface Inference and using vLLM?

Related topics