When in a distributed setting, should the trackers (for instance accelerate.tracking.TensorBoardTracker
) be defined only by the main process (i.e. wrapped inside a accelerator.is_main_process
condition) or all the processes?
For reference, wandb allows both with slightly different outputs: Distributed Training - Documentation
Accelerate only supports on the main process If there is a need or desire to do logging across all of them, we can support that. But all of the logging functionalities are purposefully limited to just the main process.
Also you donât need to do if accelerator.is_main_process
for init specifically if you are building off main. (This will be propagated to the next release)
2 Likes