Hello @bobloki,
Normally the Inference Toolkit identifies if a GPU is available and uses it. See here: sagemaker-huggingface-inference-toolkit/transformers_utils.py at 7cb5009fef6566199ef47ed9ca2a3de4f81c0844 · aws/sagemaker-huggingface-inference-toolkit · GitHub
And we haven’t seen any other customer seeing this issue.
Could you please try to update the transformers version and test it again.
P.S. You can go with ml.g4dn.xlarge
instead of ml.g4dn.4xlarge
to save cost since both only use 1 GPU.
You can find a list of available versions here: Reference
below is your shared snippet using higher version
hub = {
'HF_MODEL_ID':'deepset/roberta-base-squad2',
'HF_TASK':'question-answering'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.12',
pytorch_version='1.9',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g4dn.xlarge' # ec2 instance type
)