Parameters that contribute to GPU Memory

Hi,

I want to know exactly which all parameters are contributing to GPU Memory.

https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.microsoft.com/en-us/research/uploads/prod/2020/09/dnnmem.pdf&ved=2ahUKEwjoy9zygNmCAxUGTGwGHUMtDUcQFnoECAsQAQ&usg=AOvVaw2Lu04KTTmjP39qHbyB_Wa-

Please explain with one simple example.

Let’s assume I have 1 layer Feed forward neural network. With input features size 100D and output layer with 2 units. Without bias

So total parameters that needs GPU :

  1. For batch size of 8, weight Matrix of size 8100(1002) ,weight gradient will 1002

Is my understanding correct?