transformers/src/transformers/trainer.py at main · huggingface/transformers (github.com)
i was looking at code of trainer.py for understanding how it works under the hood. i saw that it has model.zero_grad() rather than optimizer.zero_grad(). from what i have known from previous examples is that we reset the gradient stored in optimizer to zero such that it will take the next gradient.
does model.zero_grad() resets the gradient too? if yes, then when to use it and when not?
i am new to machine learning and deep learning in general. so i was curious how it is working and what is the key concept or things i missed.