When to use SageMaker multi model endpoint

bennicholl · June 6, 2022, 7:50pm

I’m wondering when it becomes more efficient to use multi model endpoints with SageMaker.

Right now I’m working on a project that uses PyTorch/huggingface transformer neural nets to classify words in natural language. This is the first model. The second model then takes the output of the first model and runs through it through a second transformer in order to calculate a similarity metric with a value in some database.

At first I was going to just separate both of these models completely, and put them on separate endpoints and connect the logic together with some wrapper script, but now I’m thinking it may be better to put both models on the same endpoint by utilizing a multi model endpoint.

Would a multi model endpoint make more sense for this use case? If so, is there any good article or documentation pertaining to how this can be achieved? Thanks!

philschmid · June 7, 2022, 7:26am

@bennicholl I think this really depends on your use case, limitations, budget, and load.

If you have a huge load and you need to scale the models up and down and the latency is different for both then it might make more sense to keep them separate.
If the models are always used sequentially then you could also put both models in to the same endpoint with a inference.py rather then creating a multi-model endpoint, that way you can leverage GPU (GPUs are currently not supported by MME)
If you have quite infrequent load and not latency requirements then you could go with SageMaker Serverless instead of MME.

If so, is there any good article or documentation pertaining to how this can be achieved?

Could you please explain what you mean by that

trikande · November 16, 2022, 6:31pm

Quick update on this one. SageMaker Multi Model Endpoint now supports GPUs. Checkout –

Whats New- Amazon SageMaker now enables customers to cost effectively host 1000s of GPU models using Multi Model Endpoint

Launch Blog- Run multiple deep learning models on GPU with Amazon SageMaker multi-model endpoints | AWS Machine Learning Blog

trikande · November 16, 2022, 6:32pm

Tutorial- AWS On Air ft. Multi Model Endpoints for GPU | AWS Events - YouTube

Documentation- Host multiple models in one container behind one endpoint - Amazon SageMaker

Topic		Replies	Views
Aws sagemaker multimodel endpoint Amazon SageMaker	1	953	February 2, 2023
How to deploy Sagemaker Multi-model Endpoints on GPU? Amazon SageMaker	0	403	December 14, 2023
Sagemaker multimodel endpoint Amazon SageMaker	1	485	February 2, 2023
Model works but MultiDataModel doesn't Amazon SageMaker	10	1120	August 18, 2021
Getting "No worker is available to serve request: model" with HuggingFaceModel endpoint Amazon SageMaker	13	5157	March 22, 2022

When to use SageMaker multi model endpoint

Related topics