Hi- It’s me again.
I want to get a better sense of the system latency for the speech model I have deployed.
AWS provides Invocation Endpoint Metrics like ModelLatency
but that is more end-to-end. I am particularly interested in how much time is spent with the forward pass. I think I have two options:
1- HuggingFace model logs preprocess
,predict
, and postprocess
times here but there is a bug as below where predict time is not captured correctly.
2- Here, the metrics Transform Fn
is added to the context using the API described here.
My question is where does the Transform Fn
go? I have hard time finding it on CloudWatch. I will also submit a PR to fix the predict time logging.
Thanks so much!
Deniz