Hitting Deployed Endpoint *Outside* of Notebook

All the tutorials tend to end at:

predictor.predict({"input": "YOUR_TEXT_GOES_HERE"})

It’s great that the notebooks deliver you to inference, but I have no idea how to hit this endpoint outside of the context of a Jupyter Notebook. I basically have Amazon AWS Java sdk code that does this:

AmazonSageMakerRuntime runtime = AmazonSageMakerRuntimeClientBuilder.defaultClient();

String body = "{\"instances\": [{\"data\": { \"input\": \"Hello World\"}}]}";

ByteBuffer bodyBuffer = ByteBuffer.wrap(body.getBytes());


InvokeEndpointRequest request = new InvokeEndpointRequest()
        .withEndpointName("huggingface-pytorch-training-....")
        .withBody(bodyBuffer);

InvokeEndpointResult invokeEndpointResult = runtime.invokeEndpoint(request);

Unfortunately, I get an error:

{
 "code": 400,
  "type": "InternalServerException",
  "message": "Content type  is not supported by this framework.\n\n            Please implement input_fn to to deserialize the request data or an output_fn to\n            serialize the response. For more information, see the SageMaker Python SDK README."
}

Am I missing something?

Hey @rosenjcb,

Thank you for opening this thread. Yes you can use the endpoint with the aws sdk for this you can use the InvokeEndpoint method. Java doc
It looks like you are already doing this and there are only a few missing parts i guess.
The Endpoint expects a JSON as HTTP Body and as the error says you are missing the Content-Type: application/json for that.

I have to say i have no JAVA experience at all but i found this on StackOverflow:

InvokeEndpointRequest invokeEndpointRequest = new InvokeEndpointRequest();
invokeEndpointRequest.setContentType("application/x-image");
ByteBuffer buf = ByteBuffer.wrap(image);

invokeEndpointRequest.setBody(buf);
invokeEndpointRequest.setEndpointName(endpointName);
invokeEndpointRequest.setAccept("application/json");

AmazonSageMakerRuntime amazonSageMaker = AmazonSageMakerRuntimeClientBuilder.defaultClient();
InvokeEndpointResult invokeEndpointResult = amazonSageMaker.invokeEndpoint(invokeEndpointRequest)

maybe this helps you crafting your request.
You can also find a example on using the aws sdk with python below

        response = client.invoke_endpoint(
            EndpointName=ENDPOINT_NAME,
            ContentType="application/json",
            Accept="application/json",
            Body=JSON_STRING,
        )
1 Like

@philschmid I found your example and many others after an hour or so of digging. Once you get the model into SageMaker, the inferrence instructions are pretty easy to google (you don’t even need to mention HuggingFace anymore since it’s abstracted from the SageMaker platform.

As you mentioned, I needed to set the Content-Type of the request to application/json and I also needed to correct my query string to this: {"inputs": "Hello World"}. That’s it, no need to plaster on the instances or data structures - if you’re only asking for one query you can just pass the request along as you normally would when using the HuggingFace inference API.

Thanks for the help! The way this integrates into AWS without too much hassle is super important and will encourage NLP adoption across many teams - no doubt.

1 Like

Hello @rosenjcb,

Great to hear that you could solve it! And yes the API Contract is similar to the Hugging Face API. You can find more information in the documentation Reference.

Hello, Found this thread when searching for the same issue. I have deployed the sentence-transformers/miniLM model as a sagemaker endpoint. sentence-transformers/all-MiniLM-L6-v2 · Hugging Face

The predictor.predict method works for me. However when using client.invoke_endpoint from another notebook, I get an error when I pass json asking me to pass bytes. When I pass bytes I get a model error as well. Any idea?