Multiple Requests to HuggingFace InferenceEndpoints are not working with custom Docker deployment. :-(

Hi coders,
I created a fastapi server with some different endpoints and tried deploying on HuggingFace InferenceEndpoint using Docker Image Config; Everything worked well and now comes to testing part:

  • When I send request one after another it is working well without throwing any errors.
  • BUT when I send multiple requests using python requests and concurrent.futures.multithreading it throws errors for some calls but most of call are failing. Getting errors : service unavailable 503
  • BUT when I deploy the some custom code by using custom handler.py then it is not throwing any error. I tested this one by sending 100 requests simultaneously and worked like a charm. Of course, it was waiting to get their term but it didn’t fail.

Can anybody explain me what’s wrong with the custom Docker image?

FYI: I tried few things but didn’t work.

  • using uvicorn , tested with multiple concurrency even that didn’t work.
  • using requests module to call the apis/endpoints.
  • when I create multiple replicas then it seems to be working as very few requests are failing in that case but not working 100%.

Thanks, for spending your time to read this post. I truly appreciate any suggestions or recommendations. Thanks again!! I’m expecting a response back from someone.