Does the HF Trainer class support multi-node training? Or only single-host multi-GPU training?
It supports both single-node and multi-node distributed training with the PyTorch launcher (
thanks! who is doing the cross-instance allreduce? PyTorch DDP? Or Horovod? or some custom HF allreduce? any sample ?
It’s standard PyTorch DDP behind the scenes.