In Accelerate we report only on the main process, as shown here: transformers/examples/pytorch/text-classification/run_glue_no_trainer.py at main · huggingface/transformers · GitHub
For non-accelerate, I think it’d look something like:
if int(os.environ.get("LOCAL_RANK", -1)) == 0:
# do any wandb thing on a single node
and do that every single time.
Note this is torch.distributed.run/launch specific, so it won’t work for TPUs in this way. But you need to check that you are on the local node/main node.
Since you’re very interested in the inner workings like this, would recommend giving Accelerate a peek if possible. Sounds like you may like what we’ve done there ![]()
(Though transformers does not use Accelerate internally right now for this type of stuff)