Doubt about trainer optimizer step

vivek9840 · March 9, 2024, 5:47am

transformers/src/transformers/trainer.py at main · huggingface/transformers (github.com)

i was looking at code of trainer.py for understanding how it works under the hood. i saw that it has model.zero_grad() rather than optimizer.zero_grad(). from what i have known from previous examples is that we reset the gradient stored in optimizer to zero such that it will take the next gradient.

does model.zero_grad() resets the gradient too? if yes, then when to use it and when not?

i am new to machine learning and deep learning in general. so i was curious how it is working and what is the key concept or things i missed.

Topic		Replies	Views
BERTology compute_heads_importance without zero grad Intermediate	0	324	October 7, 2020
I meet the zero gradient descent Models	7	898	November 13, 2020
Finding gradients in zero-shot learning Intermediate	4	2839	November 17, 2020
Is there a way to backpropagate through multiple steps while using Trainer API 🤗Transformers	1	255	July 9, 2021
Trainer optimizer 🤗Transformers	11	9162	August 7, 2021

Doubt about trainer optimizer step

Related topics