How do I deploy a hub model to SageMaker and give it a GPU (not Elastic Inference)?

See below answer from @philschmid

In addition, note that modern GPU have large parallel compute abilities (2500+ cores in the T4 of the G4), and that it is tough to make them busy. The compute workload of few 1-record inference is thousand times smaller than training (that also does backprop in addition to forward, and batches compute), and it happens that running few inferences manually on the GPU does not make it busy enough to see activity in cloudwatch, that produces 1-min aggregates

1 Like