Parameters that contribute to GPU Memory


I want to know exactly which all parameters are contributing to GPU Memory.

Please explain with one simple example.

Let’s assume I have 1 layer Feed forward neural network. With input features size 100D and output layer with 2 units. Without bias

So total parameters that needs GPU :

  1. For batch size of 8, weight Matrix of size 8100(1002) ,weight gradient will 1002

Is my understanding correct?