Profiling models on execution time and memory


Is there a clean way to profile the models to get layerwise latency and memory consumption?

I tried torch.profiler but it provides only lower level metrics. Since transformer modules are matrix multiplication at the end of the day, it just says aten::mm takes up all the execution time and memory.

Any pointers would be helpful. Thank you!


You can use NVIDIA Profiler such as NCU, NSYS GUI application.