50 ms inference, 500 ms latency


I don’t know how to solve this - I’m assuming this is something between the AWS and the model? I have a small distilBERT model, serving a binary classifier. The logs say 50 msecs, but my measured “round trip” latency from my notebook is 500 msecs. This is not practical. Can it be fixed in anyway?