Best practices for choosing instance size for inference

timelf · February 24, 2023, 8:59am

Hi,

I was wondering what a good procedure is for choosing instance sizes when deploying hugginface models as Sagemaker inference endpoints. Are there resources availalbe what instance sizes are good for different model sizes and the corresponding performance (latency, number invocations/min, etc.)?

Now I’m deploying facebook/bart-large-cnn and I just manually test for different instance types to find out what works for our use case but I feel like this can be done a bit faster.

Hatman · February 25, 2023, 6:24am

Don’t know of anything on the Huggingface side. Would be a great resource for the month that it’s current for.

From the docs:

The size and type of data can have a great effect on which hardware configuration is most effective. When the same model is trained on a recurring basis, initial testing across a spectrum of instance types can discover configurations that are more cost-effective in the long run. Additionally, algorithms that train most efficiently on GPUs might not require GPUs for efficient inference. Experiment to determine the most cost effectiveness solution. To get an automatic instance recommendation or conduct custom load tests, use Amazon SageMaker Inference Recommender.

Topic		Replies	Views
Which model for inference on 11 GB GPU? Beginners	1	394	October 30, 2021
Facebook/bart-large-mnli inference when deployed on SageMaker Amazon SageMaker	1	1082	April 29, 2022
Integration and Scale Inference Endpoints on the Hub	2	53	September 11, 2024
Recommend an instance for MPT-7B and MPT-30B inference Amazon SageMaker	2	405	July 19, 2023
Different results with model hosted in HuggingFace and hosted in SageMaker Models	1	591	November 15, 2023

Best practices for choosing instance size for inference

Related topics