Hitting Deployed Endpoint Outside of Notebook

rosenjcb · October 22, 2021, 9:35pm

All the tutorials tend to end at:

predictor.predict({"input": "YOUR_TEXT_GOES_HERE"})

It’s great that the notebooks deliver you to inference, but I have no idea how to hit this endpoint outside of the context of a Jupyter Notebook. I basically have Amazon AWS Java sdk code that does this:

AmazonSageMakerRuntime runtime = AmazonSageMakerRuntimeClientBuilder.defaultClient();

String body = "{\"instances\": [{\"data\": { \"input\": \"Hello World\"}}]}";

ByteBuffer bodyBuffer = ByteBuffer.wrap(body.getBytes());


InvokeEndpointRequest request = new InvokeEndpointRequest()
        .withEndpointName("huggingface-pytorch-training-....")
        .withBody(bodyBuffer);

InvokeEndpointResult invokeEndpointResult = runtime.invokeEndpoint(request);

Unfortunately, I get an error:

{
 "code": 400,
  "type": "InternalServerException",
  "message": "Content type  is not supported by this framework.\n\n            Please implement input_fn to to deserialize the request data or an output_fn to\n            serialize the response. For more information, see the SageMaker Python SDK README."
}

Am I missing something?

philschmid · October 25, 2021, 7:57am

Hey @rosenjcb,

Thank you for opening this thread. Yes you can use the endpoint with the aws sdk for this you can use the InvokeEndpoint method. Java doc
It looks like you are already doing this and there are only a few missing parts i guess.
The Endpoint expects a JSON as HTTP Body and as the error says you are missing the Content-Type: application/json for that.

I have to say i have no JAVA experience at all but i found this on StackOverflow:

InvokeEndpointRequest invokeEndpointRequest = new InvokeEndpointRequest();
invokeEndpointRequest.setContentType("application/x-image");
ByteBuffer buf = ByteBuffer.wrap(image);

invokeEndpointRequest.setBody(buf);
invokeEndpointRequest.setEndpointName(endpointName);
invokeEndpointRequest.setAccept("application/json");

AmazonSageMakerRuntime amazonSageMaker = AmazonSageMakerRuntimeClientBuilder.defaultClient();
InvokeEndpointResult invokeEndpointResult = amazonSageMaker.invokeEndpoint(invokeEndpointRequest)

maybe this helps you crafting your request.
You can also find a example on using the aws sdk with python below

        response = client.invoke_endpoint(
            EndpointName=ENDPOINT_NAME,
            ContentType="application/json",
            Accept="application/json",
            Body=JSON_STRING,
        )

rosenjcb · October 28, 2021, 5:04pm

@philschmid I found your example and many others after an hour or so of digging. Once you get the model into SageMaker, the inferrence instructions are pretty easy to google (you don’t even need to mention HuggingFace anymore since it’s abstracted from the SageMaker platform.

As you mentioned, I needed to set the Content-Type of the request to application/json and I also needed to correct my query string to this: {"inputs": "Hello World"}. That’s it, no need to plaster on the instances or data structures - if you’re only asking for one query you can just pass the request along as you normally would when using the HuggingFace inference API.

Thanks for the help! The way this integrates into AWS without too much hassle is super important and will encourage NLP adoption across many teams - no doubt.

philschmid · October 29, 2021, 6:21am

Hello @rosenjcb,

Great to hear that you could solve it! And yes the API Contract is similar to the Hugging Face API. You can find more information in the documentation Reference.

sow3110 · July 4, 2023, 1:09pm

Hello, Found this thread when searching for the same issue. I have deployed the sentence-transformers/miniLM model as a sagemaker endpoint. sentence-transformers/all-MiniLM-L6-v2 · Hugging Face

The predictor.predict method works for me. However when using client.invoke_endpoint from another notebook, I get an error when I pass json asking me to pass bytes. When I pass bytes I get a model error as well. Any idea?

Topic		Replies	Views
Aws sagemaker deployed model that takes an image at endpoint Inference Endpoints on the Hub	4	1187	February 14, 2024
Endpoint Deployment Amazon SageMaker	1	1109	September 20, 2021
InternalServerException from bart model created from s3 Amazon SageMaker	1	389	May 22, 2023
Custom Attributes for Asynchronous Endpoint Sagemaker Amazon SageMaker	1	45	February 20, 2025
Getting ModelError when trying to interact with deployed fine-tuned (LoRA/PEFT) model via Amazon API Gateway and AWS Lambda Amazon SageMaker	3	1671	July 21, 2023

Hitting Deployed Endpoint *Outside* of Notebook

Related topics

Hitting Deployed Endpoint Outside of Notebook