How to utilize AWS and VLLM, to make an Api available to a running llm (any opensource model)on an AWS sage maker gpu

808Code · December 15, 2023, 11:42am

I’m using the code below code to do exactly this:

python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8888 --mistralai/Mistral-7B-Instruct-v0.1 & npx localtunnel --port 8888

on sage maker notebook terminal.

what happens is when I run this line, the model is available on local host that I’m tunneling, to make it available as Api.

But even though it works. I don’t think this is the right technique, is there another technique where i don’t have to tunnel it and rather use aws services (which is available to me ) to make the api available, using the vllm framework.

Topic		Replies	Views
Mistral AI Sagemaker deployment failing Amazon SageMaker	3	2062	December 29, 2023
Host a Model with vllm for RAG Models	6	3545	September 12, 2024
Multi-lora serving with adapters on S3 Amazon SageMaker	0	112	November 12, 2024
LLM Inference hosting issue Intermediate	2	392	December 4, 2023
Using S3 as model cache for Huggingface LLM inference DLC on Sagemaker Amazon SageMaker	1	3865	June 21, 2023

How to utilize AWS and VLLM, to make an Api available to a running llm (any opensource model)on an AWS sage maker gpu

Related topics