Why evaluation in multiple nodes for distributed mode

VicDon · June 25, 2023, 5:33am

It seems originally the evaluation was in one single node and then convert to multiple nodes. Issue. There are indeed complex operations to gather the output tensors and data flow between device and host. I was wondering why the evaluation for multiple nodes instead of on one node is better, is it purely for efficiency or there might be other reason ?

Topic		Replies	Views
HF Trainer downstream evaluation on multiple GPUS 🤗Transformers	1	1067	December 21, 2022
Multi-node training 🤗Accelerate	2	2940	January 16, 2023
Model's evaluation in DDP training is using only one GPU Beginners	1	1043	September 14, 2023
Why is `accelerator.save` saving once for each node? 🤗Accelerate	2	618	August 31, 2022
[seq2seq] Run distributed eval somewhat faster than run_eval 🤗Transformers	0	258	October 28, 2020

Why evaluation in multiple nodes for distributed mode

Related topics