My inference endpoint went from 1 second to 20-30 seconds, even example

Not sure how I could go to production like this or even demo… unreliable?

Same roberta-base-squad2-ect on
AWS us-east-1
CPU · Intel Sapphire Rapids · 1x vCPU · 2 GB

Feb 23rd

Feb 23, 22:28:12 INFO 2025-02-23 22:28:12 - huggingface_inference_toolkit - - POST / Duration: 2097.84 ms
Feb 23, 22:29:02 INFO 2025-02-23 22:29:02 - huggingface_inference_toolkit - - POST / Duration: 1577.03 ms
Feb 23, 22:29:04 INFO 2025-02-23 22:29:04 - huggingface_inference_toolkit - - POST / Duration: 1662.39 ms
Feb 23, 22:30:00 INFO 2025-02-23 22:30:00 - huggingface_inference_toolkit - - POST / Duration: 2195.06 ms
Feb 23, 22:30:02 INFO 2025-02-23 22:30:02 - huggingface_inference_toolkit - - POST / Duration: 2127.71 ms
Feb 23, 22:36:23 INFO 2025-02-23 22:36:23 - huggingface_inference_toolkit - - POST / Duration: 2071.23 ms
Feb 23, 22:36:25 INFO 2025-02-23 22:36:25 - huggingface_inference_toolkit - - POST / Duration: 1929.39 ms
Feb 23, 22:37:36 INFO 2025-02-23 22:37:36 - huggingface_inference_toolkit - - POST / Duration: 1264.34 ms
Feb 23, 22:37:37 INFO 2025-02-23 22:37:37 - huggingface_inference_toolkit - - POST / Duration: 1226.00 ms

Feb 24

eb 24, 22:15:28 INFO 2025-02-24 22:15:28 - huggingface_inference_toolkit - - POST / Duration: 31006.63 ms
Feb 24, 22:19:04 INFO 2025-02-24 22:19:04 - huggingface_inference_toolkit - - POST / Duration: 21957.62 ms
Feb 24, 22:24:56 INFO 2025-02-24 22:24:56 - huggingface_inference_toolkit - - POST / Duration: 21581.66 ms
1 Like

It seems that they are different libraries and so there doesn’t seem to be a direct relationship, but I found a similar issue related to endpoints. I think it’s unresolved…
There may be some kind of latent bug.

I’m unsure what you mean by different libraries. I’m running the same text for content and the same questions to the same inference endpoint model from one day to the next and seeing this 10x performance decline.

Today it is back to performing as expected:

Feb 25, 12:15:45 INFO 2025-02-25 12:15:45 - huggingface_inference_toolkit - - POST / Duration: 1011.06 ms
Feb 25, 12:16:13 INFO 2025-02-25 12:16:13 - huggingface_inference_toolkit - - POST / Duration: 1353.81 ms
1 Like