How to extract gradient during training in pytorch with Trainer module?

jpcorb20 · October 27, 2022, 8:43pm

Hello,

I am quite familiar overall with the Trainer module and the models. Yet, it is not perfectly clear to me how to customize it to get gradient metrics like the norm by layer. What would be the best way?

Thanks in advance for your help!

RomanCast · January 12, 2023, 3:11pm

Hey,

I am having the same issue. I tried creating a custom callback to log gradients to a json file, however the on_step_end hook is called after model.zero_grad in the training loop, which prevents logging any statistics on the gradients.

Do you have any idea on how to do it differently?

For reference, here is the code for my callback :

class GradientsCallback(TrainerCallback):
    def __init__(self, norm_type: float = 2.0):
        self.norm_type = float(norm_type)

    def on_step_end(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
        if control.should_log:
            model = kwargs["model"]
            grads = {
                n: p.grad.data.norm(self.norm_type).item() for n, p in model.named_parameters() if p.grad is not None
            }

            gradient_logging_file = os.path.join(args.logging_dir, "gradient_norms.json")

            try:
                data = json.load(open(gradient_logging_file, "r"))
            except:
                data = {}

            with open(gradient_logging_file, "w") as f:
                data[state.global_step] = grads
                json.dump(data, f)

@sgugger

jpcorb20 · January 12, 2023, 6:58pm

Hello,

Unfortunately, I had to re-compute the gradients to log them via the callbacks, which is of course sub-optimal. It doesn’t seem to be possible unless you overwrite some parts of the Trainer code. In my case, I needed the per-sample gradients (Per-sample-gradients — functorch 1.13 documentation), which is not directly available.

RomanCast · January 13, 2023, 9:21am

I see, thanks for your answer.

I still feel like it would be helpful to have at least access to the averaged gradients (i.e. p.grad.data), for instance for debugging purposes. I think my solution will be to report to Weights & Biases (instead of tensorboard) which logs gradient histograms.

Arceda · July 2, 2023, 5:45pm

Hi, I got the same problem and I modified the trainer.py file, in my case was located at: /usr/local/lib/python3.9/dist-packages/transformers/trainer.py.

Here is my new trainer.py, I edited lines 37 and 1884 to plot gradients each 1000 steps

Save a copy before, because this file always plot the gradients when train method is called.

Topic	Replies	Views
Getting Zero Gradients for Bert while using HFTrainer Beginners	474	May 31, 2023
How to make the Trainer log custom quantities? 🤗Transformers	551	May 31, 2023
Did the parameter update occurs after calling TrainerCallback.on_step_end()? Models	165	October 25, 2023
Doubt about trainer optimizer step Beginners	205	March 9, 2024
Trainer.train() prints some values like loss, grad_norm etc. to the console but not to log file Beginners	79	September 23, 2024

How to extract gradient during training in pytorch with Trainer module?

Related topics