Clarification on training metrics

In the example scripts (e.g. accelerate/ at main 路 huggingface/accelerate 路 GitHub), a variable total_loss is used to compute the average loss on training datapoints, which is then logged using accelerator.log

Is the resulting metric process-specific, or is the loss somehow aggregated across processes.

In the former case (in which the metric is the average loss for a single process), is there an alternative suggested way to compute the metric across all processes in training. I assume gather_for_metrics could be used, but will this induce any additional cost?