Multi-GPU is slower than single GPU when running examples

duncanswilson · July 11, 2024, 8:17pm

I have a machine with 3 3090s and have been using accelerate with lm_eval to speed up inference and seeing sensible results.

I wrote some custom training scripts using accelerate but noticed about a 3x slowdown vs the single GPU case. While debugging, I decided to try the nlp_example.py and I’m getting significant slowdowns between the single GPU and multi-GPU case there too.

I’ve found this topic on a similar slowdown, but I am able to see performance gains with accelerate and lm_eval, so I doubt that it is a CUDA/pytorch version incompatibility…

For reference, when running nlp_example.py on one GPU I am getting 44 secs total for three epochs and for the multiGPU case I get 4 mins.

I’m happy to provide any other information (package versions and CUDA versions) if that is needed.

duncanswilson · July 24, 2024, 7:33pm

If anyone else runs into this issue: it turned out to be a BIOS level change that was needed in order to fix the communication overhead. Specifically changing the link speed on the PCI ports from Gen 1 to Gen 4 resulted in seeing speedups using multiple GPUs for fine-tuning!

system · July 25, 2024, 7:33am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single GPU is faster than multiple GPUs 🤗Accelerate	3	1927	January 31, 2024
What does "--multi_gpu" do under the hood? (and how to use it) 🤗Accelerate	7	6396	May 31, 2023
Run_ner.py slower on multi-GPU than single GPU Beginners	1	1804	September 23, 2020
Multi-gpu training does not optimize as expected Beginners	1	450	February 26, 2024
Multi-gpu inference Beginners	2	827	May 14, 2024

Multi-GPU is slower than single GPU when running examples

Related topics