In the example scripts (e.g. accelerate/complete_cv_example.py at main 路 huggingface/accelerate 路 GitHub), a variable total_loss is used to compute the average loss on training datapoints, which is then logged using accelerator.log
Is the resulting metric process-specific, or is the loss somehow aggregated across processes.
In the former case (in which the metric is the average loss for a single process), is there an alternative suggested way to compute the metric across all processes in training. I assume gather_for_metrics could be used, but will this induce any additional cost?
Thanks!