How to utilize AWS and VLLM, to make an Api available to a running llm (any opensource model)on an AWS sage maker gpu

I’m using the code below code to do exactly this:

python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8888 --mistralai/Mistral-7B-Instruct-v0.1 & npx localtunnel --port 8888

on sage maker notebook terminal.

what happens is when I run this line, the model is available on local host that I’m tunneling, to make it available as Api.

But even though it works. I don’t think this is the right technique, is there another technique where i don’t have to tunnel it and rather use aws services (which is available to me ) to make the api available, using the vllm framework.