RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch)

Hi,

I am trying to train a language model from scratch, but when I try to train my model, I get this error:

RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch)

Can anyone help me?

I am so close to training my model, but this keeps happening.

Also, I have tried this code but it does not seem to fix my issue:

import torch, gc

gc.collect()
torch.cuda.empty_cache()

maybe you can try lower your per_gpu_batch_size in TrainingArguments.

After batch size reduction, then consider adafactor optimiser instead of Adam. Also if you can get fp16 training to converge successfully for your chosen model then that would be cut memory needs a lot.