For AsyncInference there is another very important configuration required to prevent the 413 error.
env={
'MMS_MAX_REQUEST_SIZE': '2000000000',
'MMS_MAX_RESPONSE_SIZE': '2000000000',
'MMS_DEFAULT_RESPONSE_TIMEOUT': '900'
}
HuggingFaceModel(env=env …)
@philschmid
would be nice to have it mentioned in the documentation