I am using
Accelerate to run distributed inference (in which I use scores from a pretrained model to do other things in a program). Currently I am getting an OOM error pretty far into my eval dataset. Since I have a fixed batch size and pad all batches to the same max sequence length, it seems like the model/a particular batch being too big are not the problem.
I was hoping someone could help me understand more clearly what happens under the hood so I can debug.
When I run:
accelerator = Accelerator() dataloader, NLI_model = accelerator.prepare( dataloader, NLI_model )
- Does the entire dataloader get put on GPU by the prepare method? So if I store a bunch of samples on my dataset, do these sit in GPU RAM? Or do they get put on GPU in the collator?
- Since I am running on several datasets, I call
accelerator.preparemany times. Could this be somehow accumulating memory?
- Should I put everything in one dataset first?
- Should I be using the same accelerator, and if so, do I need to do something to clear the old dataloader before adding a new one?