Shared Memory in Accelerate

Hey, I have a question about how to interact with shared memory and accelerate.

I have been using accelerate to streamline my multi-GPU training, specifically performing distributed training across 4 GPU’s. However, my dataset is very large (40GB) and when it is copied to 4 GPU’s it takes up over 160GB of RAM.

The dataset itself is just a single tensor object that contains the same data across each device. Is there a way to force accelerate to use a single shared memory location for the dataset so that it only takes 40GB of RAM instead of 160GB?

Not really. That’s why we use Datasets in all our examples, which caches everything on disk, so nothing takes space in RAM in those kinds of distributed training.

One workaround would be to define your datasets as None/empty on all processes except process 0 and use dispatch_batches=True.

1 Like

@sgugger, I have a related question. I am trying to understand how distributed code with accelerate works. How do I synchronize certain variables across different processes? Is FileStorage is the only possible shared memory?

@sgugger Thanks for the response! It has been a while but I have a follow up question to this. I did what you suggested and set all of my datasets except for the main process one to be empty.

However, I guess I am a bit unsure on the internals of how dispatch_batches actually operates. Since all the other processes have empty datasets, they fly through my training loops and everything becomes out-of-sync pretty fast.

Is there any way for me to keep the processes synced up with accelerate even when only one process actually has data using dispatch_batches?

It’s kind of a weird use case for the API but I appreciate any advice on this!