Profiling models on execution time and memory

Hi

Is there a clean way to profile the models to get layerwise latency and memory consumption?

I tried torch.profiler but it provides only lower level metrics. Since transformer modules are matrix multiplication at the end of the day, it just says aten::mm takes up all the execution time and memory.

Any pointers would be helpful. Thank you!

Hi.

You can use NVIDIA Profiler such as NCU, NSYS GUI application.