Payload too large for Async Inference on Sagemaker

philhd · May 23, 2023, 9:51am

In order to transcribe audio files withe the whisper model I would assume that the asyn inference option on AWS sagemaker might be the right choice for long audio files (1 hour, around 5-50mb).

According to the docs it should be possible to have payload sizes up to 1gb

I followed philipp Schmids article here

but I do get the following error which is surprising to me, since my payload is around 11mb.

Received client error (413) from primary and could not load the entire response body


hub = {
        'HF_MODEL_ID': 'openai/whisper-base',
        'HF_TASK': 'automatic-speech-recognition'
    }

    # create Hugging Face Model Class
    huggingface_model = HuggingFaceModel(
        env=hub,  # configuration for loading model from Hub
        role=role,  # iam role with permissions to create an Endpoint
        transformers_version="4.26",  # transformers version used
        pytorch_version="1.13",  # pytorch version used
        py_version='py39',  # python version used
    )

    # create async endpoint configuration
    async_config = AsyncInferenceConfig(
        output_path=s3_path_join("s3://", sagemaker_session_bucket, "async_inference/output"),
        # Where our results will be stored
        # notification_config={
        #   "SuccessTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
        #   "ErrorTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
        # }, #  Notification configuration
    )

    # deploy the endpoint
    huggingface_model.deploy(
        initial_instance_count=1,
        instance_type="ml.m5.xlarge",  # ml.g4dn.xlarge,
        async_inference_config=async_config
    )

def predict():
    session = boto3.session.Session()
    sagemaker_session = sagemaker.Session(session)

    predictor = HuggingFacePredictor(endpoint_name=endpoint_name,
                                     sagemaker_session=sagemaker_session,
                                     serializer=audio_serializer
                                     )
    async_predictor = AsyncPredictor(predictor)


    ASYNC_S3_PATH = "s3://async-inf/async-distilbert"

    with open(audio_path, "rb") as data_file:
        audio_data = data_file.read()

        data = {
            "s3_file": "s3://async-inf/async-distilbert"
            # "language": "pl"
        }
        res = async_predictor.predict_async(input_path="s3://async-inf/async-distilbert")
        # res = async_predictor.predict_async(data=audio_data, input_path=ASYNC_S3_PATH)
        config = WaiterConfig(
            max_attempts=5,  # number of attempts
            delay=10  # time in seconds to wait between attempts
        )

        res.get_result(config)
        print(res)

@philschmid Any idea about how to post large payloads to the async endpoint?
Anyhow thanks a lot for your tireless support. Very much appreciated.

philschmid · May 23, 2023, 10:24am

Hey @philhd,

There might be a minor error in the inference code, it seems that "s3://async-inf/async-distilbert" is not pointing to a “file” only to a “directory”

philhd · May 23, 2023, 10:29am

Hey @philschmid , thanks for the swift reply. This s3://async-inf/async-distilbert is actually the file. I missed the file ending. I tried to rename it in the bucket and the code accordingly input_path="s3://async-inf/async-distilbert.mp3" but no change .

philschmid · May 25, 2023, 12:27pm

Have you tried creating a custom inference.py script to log some information, e.g. if the data file gets correct passed into the handler?

philhd · May 25, 2023, 12:35pm

I got it up and running by doing it slightly different

def infer_async():
    sagemaker_runtime = boto3.client("sagemaker-runtime")

    # Specify the location of the input. Should be JSON with the input audion file (example in 02_deploy_whisper-Async.ipynb notebook)
    input_location = "s3://async-inf/input.json"

    # The name of the endpoint. The name must be unique within an AWS Region in your AWS account.

    # After you deploy a model using SageMaker hosting
    # services, your client applications use this API to get inferences
    # from the model hosted at the specified endpoint.
    response = sagemaker_runtime.invoke_endpoint_async(
        EndpointName=endpoint_name,
        # ContentType='audio/mpeg',
        InputLocation=input_location,
    )
    print(response)

xolisani · June 6, 2023, 1:37pm

Whats the structure of input.json

xolisani · June 7, 2023, 1:58pm

Whats the structure of input.json. I get ann error saying “No such file or directory: \u0027s3://”

philhd · June 9, 2023, 7:00am

depends on your inference script. you can try

"s3_location" : "path_to_s3"

philhd · June 9, 2023, 7:04am

For AsyncInference there is another very important configuration required to prevent the 413 error.


env={
        'MMS_MAX_REQUEST_SIZE': '2000000000',
        'MMS_MAX_RESPONSE_SIZE': '2000000000',
        'MMS_DEFAULT_RESPONSE_TIMEOUT': '900'
    }

HuggingFaceModel(env=env …)

@philschmid
would be nice to have it mentioned in the documentation

Topic		Replies	Views
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16193	April 12, 2024
Async TEI Deployment Cannot Handle Requests greater than 2mb Amazon SageMaker	2	97	November 4, 2024
Curl parameters for aws-whisper-large inference end point? Amazon SageMaker	2	1123	October 17, 2022
Sagemaker serverless endpoint deployment error (Image size greater than support size)) Amazon SageMaker	3	1236	July 21, 2023
Using Inference API with large audio files Beginners	4	1182	September 16, 2022

Payload too large for Async Inference on Sagemaker

Related topics