Tracker in distributed setting (single node DDP or multinode DDP)

VictorSanh · August 29, 2022, 7:26pm

When in a distributed setting, should the trackers (for instance accelerate.tracking.TensorBoardTracker) be defined only by the main process (i.e. wrapped inside a accelerator.is_main_process condition) or all the processes?

For reference, wandb allows both with slightly different outputs: Distributed Training - Documentation

muellerzr · August 29, 2022, 9:09pm

Accelerate only supports on the main process If there is a need or desire to do logging across all of them, we can support that. But all of the logging functionalities are purposefully limited to just the main process.

Also you don’t need to do if accelerator.is_main_process for init specifically if you are building off main. (This will be propagated to the next release)

Topic		Replies	Views
Limiting print and log statements 🤗Accelerate	11	3346	August 3, 2022
Do Trainer and Callback get created multiple times in case of distributed setup 🤗Accelerate	1	240	December 11, 2024
Tracking summarization example results 🤗Accelerate	1	1982	December 13, 2022
Multiple wandb outputs 🤗Accelerate	7	2862	August 22, 2022
Is wandb in Trainer configured for distributed training? 🤗Transformers	3	2009	August 23, 2022

Tracker in distributed setting (single node DDP or multinode DDP)

Related topics