Deployment of finetuned Mistral for Classification and Generation

johanngerberding · June 10, 2024, 10:04am

Hi,

I have finetuned a Mistral for Text Classification and want to deploy it. I also want to deploy the Mistral Base Model for Generation. Kind of a FastAPI with two endpoints, one for text generation with the mistral base model and one for classification with my finetuned version. But I want to use the same base model and switch between them by turning the adapter on an off, is that possible? Is there an example out there?

Thanks!

nielsr · June 10, 2024, 2:07pm

Hi,

So what you’re asking for is Multi-LoRa support? The team is on it: Multi-lora support · Issue #1622 · huggingface/text-generation-inference · GitHub.

TGI (text-generation-inference) is a framework aimed at deployment of LLMs/multimodal models. Besides that, there’s also vLLM which supports multi-lora: Using LoRA adapters — vLLM.

Both TGI and vLLM offer OpenAI API compatibility, which means that you can call the models in the same way as you would call OpenAI models.

johanngerberding · June 10, 2024, 3:00pm

Yes, kind off.
I think I wasn’t clear enough.

I have a finetuned AutoModelForSequenceClassification version with the lora task type ‘SEQ_CLS’.
I have a second finetuned AutoModelForCausalLM version with the lora task type ‘CAUSAL_LM’.

I want to create an API with three endpoints:

Mistral base model → generate (no lora)
Finetuned Classification model (lora)
Finetuned generation model (lora)

I want to deploy the base model once and then switch the adapters on/off based on the request. The combination 1. and 3. I think is simple and works. But I am not sure if I can incorporate the classification lora here?

(context, I have a old Server nvidia card with 24GB)

Thank you!

nielsr · June 10, 2024, 7:09pm

Yes that should work once multi-LoRa is natively supported in TGI.

See also GitHub - predibase/lorax: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

system · June 11, 2024, 9:10am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Finetuned model outputs verbosity Beginners	0	165	March 8, 2024
Fine-tuning Mistral help Models	0	865	December 4, 2023
Fine-tuning Mistral/Mixtral for sequence classification on long context Intermediate	2	2608	May 29, 2024
Unable to deploy fine tuned Mistral Amazon SageMaker	0	268	May 6, 2024
Mistral 7B FineTuning with Interview Data Models	4	6115	March 5, 2024

Deployment of finetuned Mistral for Classification and Generation

Related topics