Hey, I have a question about how to interact with shared memory and accelerate.
I have been using accelerate to streamline my multi-GPU training, specifically performing distributed training across 4 GPU’s. However, my dataset is very large (40GB) and when it is copied to 4 GPU’s it takes up over 160GB of RAM.
The dataset itself is just a single tensor object that contains the same data across each device. Is there a way to force accelerate to use a single shared memory location for the dataset so that it only takes 40GB of RAM instead of 160GB?