Hi
Is there a clean way to profile the models to get layerwise latency and memory consumption?
I tried torch.profiler but it provides only lower level metrics. Since transformer modules are matrix multiplication at the end of the day, it just says aten::mm takes up all the execution time and memory.
Any pointers would be helpful. Thank you!