Hi,
I want to know exactly which all parameters are contributing to GPU Memory.
Please explain with one simple example.
Let’s assume I have 1 layer Feed forward neural network. With input features size 100D and output layer with 2 units. Without bias
So total parameters that needs GPU :
- For batch size of 8, weight Matrix of size 8100(1002) ,weight gradient will 1002
Is my understanding correct?