Cuda out of memory error

I encounter the below error when I finetune my dataset on mbart

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 10.76 GiB total capacity; 9.57 GiB already allocated; 16.25 MiB free; 9.70 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

my train data contains only 5000 sentences. Could anyone of you help me in sorting this out?


@sgugger can you please help me in resolving this error?

Hello @prashanth , you can try out reducing the batch_size or enable gradient-checkpointing or can do training in fp16 to save memory.

As @kkumari06 says, reduce batch size. I recommend restarting the kernel any time you get this error, to make sure you have a clean GPU memory; then cut the batch size in half. Repeat until it fits in GPU memory or until you hit batch size of 1… in which case, you’ll need to switch to a smaller pretrained model. (If training a model from scratch, you can instead reduce the size of your model, for example by reducing maximum input size or reducing number of layers.) Finally, you may want to bump up the gradient accumulation if your batch size is very small. For example, if you have a batch size of 4, gradient accumulation of 8 would give you an “effective” batch size of 32, which some research suggests is ideal… however, YMMV.