How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

marshmellow77 · February 15, 2022, 6:59am

I believe that you have to provide your own inference script if you want to leverage the GPU. This inference script needs to check if a GPU is available, i.e. it needs to contain a line like this:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Once you have detected the GPU you can call the Pipeline API with the device parameter, i.e.

pipeline("question-answering", model=model, tokenizer=tokenizer, device=device)

This notebook shows how to deploy a HF model using your own inference script by providing the entry_point and the source_dir parameters when calling HuggingFaceModel(): text-summarisation-project/4a_model_testing_deployed.ipynb at main · marshmellow77/text-summarisation-project · GitHub

Topic		Replies	Views
Issues using GPU with HuggingFace (TensorFlow) model deployed to SageMaker endpoint Amazon SageMaker	0	623	December 12, 2023
Sagemaker Endpoint Not Using GPU for PygmalionAI Amazon SageMaker	7	1821	April 18, 2024
HuggingFaceModel create fails with no GPU Amazon SageMaker	3	24	June 14, 2025
Deploying Huggingface Sagemaker Models with Elastic Inference Amazon SageMaker	21	4222	November 8, 2022
Help for inference.py code Amazon SageMaker	10	4003	March 8, 2022

How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

Related topics