How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

Hello @bobloki,

Normally the Inference Toolkit identifies if a GPU is available and uses it. See here: sagemaker-huggingface-inference-toolkit/transformers_utils.py at 7cb5009fef6566199ef47ed9ca2a3de4f81c0844 · aws/sagemaker-huggingface-inference-toolkit · GitHub

And we haven’t seen any other customer seeing this issue.

Could you please try to update the transformers version and test it again.
P.S. You can go with ml.g4dn.xlarge instead of ml.g4dn.4xlarge to save cost since both only use 1 GPU.

You can find a list of available versions here: Reference
below is your shared snippet using higher version

hub = {
	'HF_MODEL_ID':'deepset/roberta-base-squad2',
	'HF_TASK':'question-answering'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.12',
	pytorch_version='1.9',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.xlarge' # ec2 instance type
)
2 Likes