Is there any way to do distributed training (e.g. DDP) on different GPUs (e.g. from 1080 to 4090) efficiently? like without being bounded by the weakest GPU? (I have some older GPUs of different models and I want to utilize them)
Related Topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Is the trainer DDP or DP? | 0 | 278 | January 19, 2024 | |
What algorithm Trainer uses for multi GPU training (without torchrun) | 1 | 831 | January 19, 2023 | |
Hugging Face and Distributed Training: DDP/DP Implementation Help Needed | 0 | 451 | February 14, 2024 | |
Trainer API for Model Parallelism on Multiple GPUs | 5 | 3467 | September 10, 2024 | |
Model's evaluation in DDP training is using only one GPU | 1 | 954 | September 14, 2023 |