How to Finetune and deploy LLaVA-1.6

Im new here and I’m facing the following issues:

  1. I was playing with llava-v1-6-vicuna-13b. Having looked at the github repo with info on how to fine-tune the same, I was able to finetune the model. I uploaded the model to Huggingface on a private repo and I used the deploy using interface endpoints feature but got the following error:

huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error.
Repository Not Found for url:
Please make sure you specified the correct repo_idandrepo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

  1. When I was using the LlavaNextProcessor library from this link, I was unable to use it on my fine-tuned model and there are no instructions on how to fine-tune using the Library if possible or any other ways.

Any help would be highly appreciated

Hi, could you please share the github repo that you used to fine-tune llava 1.6

I used the following Repo:

Hi @PritamSriramG, I wanted to know which data set did you use?


I have used a custom data which I generated. It is of the format specified by the Repo and I was able to finetune using the github repo. The problem arises when I try to use HF libraries. I want to train using HF because, Im planning to host the model in cloud. So, it would be great if you could provide me information on how to finetune using HF or any info on how to host the finetuned model.


Fine-tuning of LLaVa-Next should now work out-of-the-box as shown in the notebook here: Transformers-Tutorials/LLaVa at master · NielsRogge/Transformers-Tutorials · GitHub. Make sure to replace the processor, model and chat templates by the one of LLaVa-Next instead of LLaVa.

Regarding deployment, TGI (a framework meant for deployment of LLMs and multimodal models) now supports LLaVa 1.6. See the guide here: Vision Language Model Inference in TGI

Hi @nielsr ,

Thanks for your suggestions. When I was trying to run your notebook, I got the following error:

ImportError: Usingbitsandbytes8-bit quantization requires Accelerate:pip install accelerateand the latest version of bitsandbytes:pip install -i Simple index bitsandbytes

at the line:

model = LlavaForConditionalGeneration.from_pretrained(

even after installing the libraries, it remained the same. Can you kindly let me know how to go ahead with this issue?


Are you running on a CUDA compatible device (not on CPU)? Otherwise restarting the runtime might help.