How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

OlivierCR · February 15, 2022, 10:40am

See below answer from @philschmid

In addition, note that modern GPU have large parallel compute abilities (2500+ cores in the T4 of the G4), and that it is tough to make them busy. The compute workload of few 1-record inference is thousand times smaller than training (that also does backprop in addition to forward, and batches compute), and it happens that running few inferences manually on the GPU does not make it busy enough to see activity in cloudwatch, that produces 1-min aggregates

Topic		Replies	Views
Issues using GPU with HuggingFace (TensorFlow) model deployed to SageMaker endpoint Amazon SageMaker	0	635	December 12, 2023
Sagemaker Endpoint Not Using GPU for PygmalionAI Amazon SageMaker	7	1863	April 18, 2024
HuggingFaceModel create fails with no GPU Amazon SageMaker	3	55	June 14, 2025
Deploying Huggingface Sagemaker Models with Elastic Inference Amazon SageMaker	21	4258	November 8, 2022
Help for inference.py code Amazon SageMaker	10	4043	March 8, 2022

How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

Related topics