Logging text using model outputs with tensorboard

Hello,

I would like to log text generated during training with the Trainer class to my Tensorboard. I’m looking into the TensorBoardCallback class, but it seems like I can’t access the model outputs easily. I came up with a solution but it seems quite hacky:

class CustomTrainer(Seq2SeqTrainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        outputs = model(**inputs)
        self.state.logits = outputs['logits']
        loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
        return (loss, outputs) if return_outputs else loss

which I then call when overriding on_log:

class CustomCallback(TensorBoardCallback):
    def on_log(
        self,
        args: TrainingArguments,
        state: TrainerState,
        control: TrainerControl,
        logs=None,
        **kwargs,
    ):
        if not state.is_world_process_zero:
            return
        
        logits = state.logits
        preds = torch.argmax(logits, axis=-1)
        idx = random.randint(0, logits.shape[0]-1)
        pred_text = kwargs['tokenizer'].batch_decode(preds, skip_special_tokens=True)[idx]
        
        del state.logits

        if self.tb_writer is None:
            self._init_summary_writer(args)

        if self.tb_writer is not None:
            self.tb_writer.add_text('preds', pred_text, global_step=state.global_step)
            logs = rewrite_logs(logs)
            for k, v in logs.items():
                if isinstance(v, (int, float)):
                    self.tb_writer.add_scalar(k, v, state.global_step)
                else:
                    print(
                        "Trainer is attempting to log a value of "
                        f'"{v}" of type {type(v)} for key "{k}" as a scalar. '
                        "This invocation of Tensorboard's writer.add_scalar() "
                        "is incorrect so we dropped this attribute."
                    )
            self.tb_writer.flush()

is there another way to retrieve outputs from my model within on_log?

1 Like

bumping for visibility

I am bumping again!

I’m seeking a method to log additional loss components during training.
My model outputs the total loss and several individual loss components, and I’m also interested in logging these.
To implement this in a manner that adheres to the principles of gradient accumulation and logging steps, it appears necessary to alter not just the training_step and compute_loss functions but also the _inner_training_loop function.

Is there a more streamlined approach to accomplish this?

I created a way to log extra losses during training.

I guess the only place where model outputs are available is in the compute_loss function, which is what I did in my workaround as well, but I used the control variable instead of the state because the control variable is writable in callbacks.

You can find my implementation here:
https://github.com/naba89/custom_hf_trainer

I know I’m a few months late, but this might be useful for other people.

I had a very unstable training script where I needed to track the GPU memory and compute utilization across various steps to make sure that the model has enough space to perform the needed calculations. Here is how I did it:

  1. Create a TensorBoardCallback class to track GPU utilization
from transformers.integrations import TensorBoardCallback

class GpuTensorboardCallback(TensorBoardCallback):
    def _compute_gpu_utilization(self):
        gpus = GPUtil.getGPUs()
        # log the GPU utilization
        avg_gpu_load = sum(gpu.load for gpu in gpus) / len(gpus)
        # log the GPU memory utilization
        avg_mem_util= sum(gpu.memoryUtil for gpu in gpus) / len(gpus)
        return avg_gpu_load, avg_mem_util
    

    def on_step_end(self, args, state, control, **kwargs):
        # for safety, check if the tensorboard writer is initialized
        if self.tb_writer is None:
            self._init_summary_writer(args)
        # log the GPU utilization
        avg_gpu_compute, avg_gpu_memory = self._compute_gpu_utilization()
        self.tb_writer.add_scalar(
            tag="GPU Utilization / Compute",
            scalar_value=avg_gpu_compute*100,
            global_step=state.global_step
        )
        self.tb_writer.add_scalar(
            tag="GPU Utilization / memory",
            scalar_value=avg_gpu_memory*100,
            global_step=state.global_step
        )
  1. Then, add this callback to your training arguments
# Create the trainer
trainer = Seq2SeqTrainer(
        model=...,
        tokenizer=...,
        ...,
        callbacks=[GpuTensorboardCallback]
)
  1. Now, you can track your GPU utilization during training. Here is how it looked like:

Hope this is useful!