Hi, I m finetuning a 1B size model with 2x24 GB cuda memory. I am using Trainer to train my model. However, when I encounter the OOM error, it seems that all 24GB of memory in GPU0 is used up but only 7GB is used up in GPU1. Is there a way to distribute the usage of memory across GPU evenly? Right now I m using a batch size of 1, fp16 but still encounter OOM issue.
I am in the same situation! Did you ever find out how to fix this?