How to run single-node, multi-GPU training with HF Trainer?

I think the docs are insufficient. See my questions here: Using Transformers with DistributedDataParallel — any examples?

4 Likes