Hi- It’s me again.
I want to get a better sense of the system latency for the speech model I have deployed.
AWS provides Invocation Endpoint Metrics like ModelLatency but that is more end-to-end. I am particularly interested in how much time is spent with the forward pass. I think I have two options:
1- HuggingFace model logs preprocess,predict, and postprocess times here but there is a bug as below where predict time is not captured correctly.
2- Here, the metrics Transform Fn is added to the context using the API described here.
My question is where does the Transform Fn go? I have hard time finding it on CloudWatch. I will also submit a PR to fix the predict time logging.
Thanks so much!
Deniz
