Hi- It’s me again.
I want to get a better sense of the system latency for the speech model I have deployed.
AWS provides Invocation Endpoint Metrics like
ModelLatency but that is more end-to-end. I am particularly interested in how much time is spent with the forward pass. I think I have two options:
1- HuggingFace model logs
postprocess times here but there is a bug as below where predict time is not captured correctly.
My question is where does the
Transform Fn go? I have hard time finding it on CloudWatch. I will also submit a PR to fix the predict time logging.
Thanks so much!