I’m wondering when it becomes more efficient to use multi model endpoints with SageMaker.
Right now I’m working on a project that uses PyTorch/huggingface transformer neural nets to classify words in natural language. This is the first model. The second model then takes the output of the first model and runs through it through a second transformer in order to calculate a similarity metric with a value in some database.
At first I was going to just separate both of these models completely, and put them on separate endpoints and connect the logic together with some wrapper script, but now I’m thinking it may be better to put both models on the same endpoint by utilizing a multi model endpoint.
Would a multi model endpoint make more sense for this use case? If so, is there any good article or documentation pertaining to how this can be achieved? Thanks!