We are sending in this config:
'HF_MODEL_ID':'BAAI/bge-m3',
"DTYPE": "float16",
"MAX_BATCH_TOKENS": "163840000",
"MAX_CLIENT_BATCH_SIZE": "320000",
"PAYLOAD_LIMIT": "200000000",
"MAX_BATCH_REQUESTS": "30",
'MMS_MAX_REQUEST_SIZE': '2000000000',
'MMS_MAX_RESPONSE_SIZE': '2000000000',
'MMS_DEFAULT_RESPONSE_TIMEOUT': '900',
However, our endpoint fails on async inferences with files greater than 2mb despite setting the payload higher and using async inference.
Received client error (413) from primary with message "Failed to buffer the request body: length limit exceeded"