Endpoint reuse & serverless endpoints

Hi all. I’ve been using the HF SDK for SM, and it’s working well. I have two questions:

  1. It looks like the SDK always deploys Endpoint resources in “real-time” mode. Is support for the new “serverless” mode forthcoming? Could help small devs like me save a lot of money. :wink:

  2. There doesn’t seem to be a way to connect HF to an existing Endpoint resource. When my python app restarts, a new Endpoint resource must be created. Are there any plans to allow the SDK to gain control of an existing Endpoint? That would be great because it takes a lot of time to redeploy new Endpoint resources. (And the old ones have to be cleaned up manually too.)

Thank you!

Hi! regarding endpoint re-use: is your goal to connect to an endpoint that is already live?
You can use the Predictor class for this:

from sagemaker.huggingface.model import HuggingFacePredictor
predictor = HuggingFacePredictor(endpoint_name="<my existing endpoints>")
1 Like

@jellersby about (1): you can use SageMaker Serverless endpoints, see blog from @philschmid here https://www.philschmid.de/serverless-transformers-sagemaker-huggingface

You’re correct that the SM Python SDK doesn’t support serverless deployments yet, but you can follow the official instructions here, that use the boto3 generic AWS SDK Create a serverless endpoint - Amazon SageMaker