In the example scripts (e.g. accelerate/complete_cv_example.py at main · huggingface/accelerate · GitHub), a variable total_loss is used to compute the average loss on training datapoints, which is then logged using accelerator.log
Is the resulting metric process-specific, or is the loss somehow aggregated across processes.
In the former case (in which the metric is the average loss for a single process), is there an alternative suggested way to compute the metric across all processes in training. I assume gather_for_metrics could be used, but will this induce any additional cost?