Why is there no cross-gpu negative sample gathering for CLIP model in multiple-gpu training?

Hello, I notice that in transformers code, when training CLIP, negative samples are not gathered across all devices in multiple-gpu training scenario. I think gathering all negative samples across all devices is what the original paper does. Why is there no such implementation?

Hi,

The CLIP training script is mainly meant for demonstration purposes but is not optimized for multi-GPU setups.

I’d recommend taking a look at OpenCLIP which provides this functionality.

Thanks! I will have a look at that.