Async TEI Deployment Cannot Handle Requests greater than 2mb

mavjan · September 20, 2024, 8:45pm

We are sending in this config:

    'HF_MODEL_ID':'BAAI/bge-m3',
    "DTYPE": "float16",
    "MAX_BATCH_TOKENS": "163840000",
    "MAX_CLIENT_BATCH_SIZE": "320000",
    "PAYLOAD_LIMIT": "200000000",
    "MAX_BATCH_REQUESTS": "30",
    'MMS_MAX_REQUEST_SIZE': '2000000000',
    'MMS_MAX_RESPONSE_SIZE': '2000000000',
    'MMS_DEFAULT_RESPONSE_TIMEOUT': '900',

However, our endpoint fails on async inferences with files greater than 2mb despite setting the payload higher and using async inference.

Received client error (413) from primary with message "Failed to buffer the request body: length limit exceeded"

philschmid · November 4, 2024, 7:20am

Quick update an i digged a bit. You can increase the size by setting the PAYLOAD_LIMIT env when creating a TEI endpoint, default is 2000000 => 2MB

mavjan · November 4, 2024, 7:47am

Unfortunately, we did try this setting the PAYLOAD_LIMIT to 200000000 => 200 mb but it did not seem to resolve the errors

Topic		Replies	Views
Payload too large for Async Inference on Sagemaker Amazon SageMaker	8	2379	June 9, 2023
URGENT HELP on Endpoint invokation 🤗Transformers	1	257	November 13, 2022
Model Stream Error - Streaming times out after 60 seconds Amazon SageMaker	0	336	May 15, 2024
Serveless memory problem when deploy Wav2Vec2 with custom inference code Amazon SageMaker	23	4008	May 27, 2022
Text Length FinBert - Serverless Inference Endpoint Amazon SageMaker	3	1470	November 5, 2022

Async TEI Deployment Cannot Handle Requests greater than 2mb

Related topics