Best option for deployment?

Keylly · April 26, 2024, 10:27am

Hi everyone, I’m new here. Looking forward to getting to know more people in the AI space. I’m still a junior in AI and ML but I’ve been a software engineer for 10 years.

I recently built a chatbot using llamaindex with RAG. I am using Ollama to host Llama 3 LLM locally. I can run everything locally on my pc, where I have a Flask API as my end point, that when the end point is hit with a question, the api calls into my llamaindex code. This works really well on my pc with a strong GPU.

My question is, what are my options for deployment in AWS? The llama3 is 4gb, plus another LLM for embeddings for RAG. There’s so much information on tutorials on how to build these AI apps, but very little on deployment in production.

How are companies doing these types of deployments?
Is this the best way to go about deployment?
Is it better to use chapgpt and pay for tokens?

Please, any advice or guidance would really be appreciated.

Amirjab21 · June 22, 2024, 10:52am

This is something I’m trying to figure out too.

This example is one I’m going to try and modify for an MVP: SpacesExamples/fastapi_t5 at main

Otherwise, you can use things like AWS sage, Azure, GCP, Paperspace.

I think you’d have to swap out Ollama for llama.cpp or HFUF for production.

Let’s jam on this!

nbadrinath · June 24, 2024, 10:16am

Check out this:

Topic		Replies	Views
Deploy model on HF Space for production Spaces	0	1003	March 11, 2022
Deploying open llm - google/flan-t5-large model on AWS inferentia2 Amazon SageMaker	0	441	September 14, 2023
Deploying my own custom Llama model to production using Hugging Face Beginners	0	828	December 9, 2023
LLM Inference hosting issue Intermediate	2	396	December 4, 2023
Deploy multilingual sentence tansformer into cloud Beginners	10	2728	July 16, 2021

Best option for deployment?

Related topics