How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

I believe that you have to provide your own inference script if you want to leverage the GPU. This inference script needs to check if a GPU is available, i.e. it needs to contain a line like this:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Once you have detected the GPU you can call the Pipeline API with the device parameter, i.e.

pipeline("question-answering", model=model, tokenizer=tokenizer, device=device)

This notebook shows how to deploy a HF model using your own inference script by providing the entry_point and the source_dir parameters when calling HuggingFaceModel(): text-summarisation-project/4a_model_testing_deployed.ipynb at main · marshmellow77/text-summarisation-project · GitHub