How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

philschmid · February 15, 2022, 7:31am

Normally the Inference Toolkit identifies if a GPU is available and uses it. See here: sagemaker-huggingface-inference-toolkit/transformers_utils.py at 7cb5009fef6566199ef47ed9ca2a3de4f81c0844 · aws/sagemaker-huggingface-inference-toolkit · GitHub

And we haven’t seen any other customer seeing this issue.

Could you please try to update the transformers version and test it again.
P.S. You can go with ml.g4dn.xlarge instead of ml.g4dn.4xlarge to save cost since both only use 1 GPU.

You can find a list of available versions here: Reference
below is your shared snippet using higher version

hub = {
	'HF_MODEL_ID':'deepset/roberta-base-squad2',
	'HF_TASK':'question-answering'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.12',
	pytorch_version='1.9',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.xlarge' # ec2 instance type
)

Topic		Replies	Views
Sagemaker Endpoint Not Using GPU for PygmalionAI Amazon SageMaker	7	1804	April 18, 2024
HuggingFaceModel create fails with no GPU Amazon SageMaker	3	21	June 14, 2025
Issues using GPU with HuggingFace (TensorFlow) model deployed to SageMaker endpoint Amazon SageMaker	0	619	December 12, 2023
Unclear documentation using CLIP on Sagemaker for inference Amazon SageMaker	1	1231	May 30, 2023
Sagemaker deployment fails for local llama2 model Amazon SageMaker	3	2265	August 17, 2023

How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

Related topics