How to configure GPU server-side batching with SageMaker HF Hosting?

OlivierCR · May 4, 2022, 10:17am

How to configure GPU server-side batching with SageMaker HF Hosting?
I want to be able to process multiple GPU inferences in one single batched call (vs doing CPU-GPU travel at every inference). MMS supports that in its open-source flavor, wondering how we can use it with SM HF?

philschmid · May 4, 2022, 11:29am

Not sure how the MMS side of things works maybe you can ask at Issues · awslabs/multi-model-server · GitHub.
But once MMS is configured it depends on the task you are using it might be possible that you need to create some custom logic using a inference.py and overwriting the input_fn or outpuf_fn.

Topic		Replies	Views
How to set MMS default_workers_per_model via Hugging Face SageMaker Hosting? Amazon SageMaker	2	1378	May 4, 2022
Batching in SageMaker Inference Toolkit Amazon SageMaker	2	994	September 5, 2021
Improve the throughput of HF Inference DLC Amazon SageMaker	13	1418	August 15, 2022
How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)? Amazon SageMaker	4	3391	February 15, 2022
Multilingual translation on SageMaker Amazon SageMaker	10	1208	May 20, 2022

How to configure GPU server-side batching with SageMaker HF Hosting?

Related topics