As @kkumari06 says, reduce batch size. I recommend restarting the kernel any time you get this error, to make sure you have a clean GPU memory; then cut the batch size in half. Repeat until it fits in GPU memory or until you hit batch size of 1… in which case, you’ll need to switch to a smaller pretrained model. (If training a model from scratch, you can instead reduce the size of your model, for example by reducing maximum input size or reducing number of layers.) Finally, you may want to bump up the gradient accumulation if your batch size is very small. For example, if you have a batch size of 4, gradient accumulation of 8 would give you an “effective” batch size of 32, which some research suggests is ideal… however, YMMV.
1 Like