I’m using the code below code to do exactly this:
python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8888 --mistralai/Mistral-7B-Instruct-v0.1 & npx localtunnel --port 8888
on sage maker notebook terminal.
what happens is when I run this line, the model is available on local host that I’m tunneling, to make it available as Api.
But even though it works. I don’t think this is the right technique, is there another technique where i don’t have to tunnel it and rather use aws services (which is available to me ) to make the api available, using the vllm framework.