Why is there no cross-gpu negative sample gathering for CLIP model in multiple-gpu training?

SenoritaCindy · March 17, 2024, 4:28pm

Hello, I notice that in transformers code, when training CLIP, negative samples are not gathered across all devices in multiple-gpu training scenario. I think gathering all negative samples across all devices is what the original paper does. Why is there no such implementation?

nielsr · March 17, 2024, 5:31pm

Hi,

The CLIP training script is mainly meant for demonstration purposes but is not optimized for multi-GPU setups.

I’d recommend taking a look at OpenCLIP which provides this functionality.

SenoritaCindy · March 18, 2024, 2:15am

Thanks! I will have a look at that.

Topic		Replies	Views
Gradient clipping on Transformers 🤗Transformers	0	253	December 20, 2023
Using 3 GPUs for training with Trainer() of transformers 🤗Transformers	2	2297	October 18, 2023
Finetuning and single-GPU utilization 🤗Transformers	0	489	August 19, 2021
Can't pickle error using accelerate multi-GPU 🤗Accelerate	6	9952	March 7, 2023
Model Parallelism, how to parallelize transformer? Beginners	3	12719	June 18, 2021

Why is there no cross-gpu negative sample gathering for CLIP model in multiple-gpu training?

Related topics