50 ms inference, 500 ms latency

wdavies · February 27, 2024, 7:28pm

Hi,

I don’t know how to solve this - I’m assuming this is something between the AWS and the model? I have a small distilBERT model, serving a binary classifier. The logs say 50 msecs, but my measured “round trip” latency from my notebook is 500 msecs. This is not practical. Can it be fixed in anyway?

Cheers,
W

Topic		Replies	Views
My inference endpoint went from 1 second to 20-30 seconds, even example Beginners	2	32	February 25, 2025
Reduce inference latency of text embedding endpoint Amazon SageMaker	1	1108	July 12, 2022
Slow inference using most recent docker image Amazon SageMaker	10	3197	March 21, 2022
Getting Endpoint Latency via API Inference Endpoints on the Hub	1	540	December 12, 2022
Inference endpoint taking forever to initialize Inference Endpoints on the Hub	1	31	May 12, 2025

50 ms inference, 500 ms latency

Related topics