LLM Inference hosting issue

316usman · December 2, 2023, 10:57am

I have a fine tuned LLM that needs to be deployed in AWS for inference. I have made an API that is going to take text from users queries and reply with the answer text. Based on this my question is that at a time I need to serve more that one consumers that can use the LLM for generation of text, what is the best way to handle this based on the fact that I have served my LLM on a single AWS instance and I don’t want to make that instance elastic.

coreaiteam · December 4, 2023, 6:08am

Hi @316usman this is not a solution reply but a question instead. can you please tell me how you fine tuned your llm model?

316usman · December 4, 2023, 5:02pm

@coreaiteam Thanks for asking, I used an AWS Sagemaker notebook to load the model and my dataset and then used QLora to fine-tune my model and then pushed it to my huggingface.

Actually I have fine-tuned a lot of models for different down stream tasks like for adding some knowledge for the LLM or modifying the tone of the model for marketing purpose those were for a single client’s use and but now I am trying deployment using AWS endpoints thats my clients’ whole team would be using at the same time so I am trying to make APIs so every user would be interacting with the model in their own space.

Topic		Replies	Views
🤗 LLM Inference Container for SageMaker Beginners	1	283	June 7, 2023
How to train an already finetuned LLM(LLama2)? Intermediate	0	300	March 13, 2024
Deploying custom inference script with llama2 finetuned model Amazon SageMaker	6	1241	January 4, 2024
Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error Amazon SageMaker	29	6823	January 8, 2024
How do Inference Endpoints fit into larger solution? Inference Endpoints on the Hub	0	412	June 17, 2023

LLM Inference hosting issue

Related topics