Profiling models on execution time and memory

yeshwanthv5 · August 16, 2023, 11:28pm

Hi

Is there a clean way to profile the models to get layerwise latency and memory consumption?

I tried torch.profiler but it provides only lower level metrics. Since transformer modules are matrix multiplication at the end of the day, it just says aten::mm takes up all the execution time and memory.

Any pointers would be helpful. Thank you!

JUNSUNGKIM99 · August 17, 2023, 5:26am

Hi.

You can use NVIDIA Profiler such as NCU, NSYS GUI application.

Topic		Replies	Views
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4500	April 9, 2024
Why the memory usage is higher than expected when loading nvidia/NV-Embed-v2 model with FP16 precision? Models	0	91	December 6, 2024
Profiling all layers of a model Research	0	762	January 26, 2024
Model Parallelism, how to parallelize transformer? Beginners	3	12711	June 18, 2021
CUDA profiling Hugging Face code? Beginners	0	124	June 10, 2024

Profiling models on execution time and memory

Related topics