Tracker in distributed setting (single node DDP or multinode DDP)

When in a distributed setting, should the trackers (for instance accelerate.tracking.TensorBoardTracker) be defined only by the main process (i.e. wrapped inside a accelerator.is_main_process condition) or all the processes?

For reference, wandb allows both with slightly different outputs: Distributed Training - Documentation

Accelerate only supports on the main process :slight_smile: If there is a need or desire to do logging across all of them, we can support that. But all of the logging functionalities are purposefully limited to just the main process.

Also you don’t need to do if accelerator.is_main_process for init specifically if you are building off main. (This will be propagated to the next release)

2 Likes