How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

In Accelerate we report only on the main process, as shown here: transformers/examples/pytorch/text-classification/run_glue_no_trainer.py at main · huggingface/transformers · GitHub

For non-accelerate, I think it’d look something like:

if int(os.environ.get("LOCAL_RANK", -1)) == 0:
  # do any wandb thing on a single node

and do that every single time.

Note this is torch.distributed.run/launch specific, so it won’t work for TPUs in this way. But you need to check that you are on the local node/main node.

Since you’re very interested in the inner workings like this, would recommend giving Accelerate a peek if possible. Sounds like you may like what we’ve done there :slight_smile:

(Though transformers does not use Accelerate internally right now for this type of stuff)

1 Like