Cuda out of memory error

prashanth · May 14, 2022, 6:37pm

I encounter the below error when I finetune my dataset on mbart

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 10.76 GiB total capacity; 9.57 GiB already allocated; 16.25 MiB free; 9.70 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

my train data contains only 5000 sentences. Could anyone of you help me in sorting this out?

Prashanth

prashanth · May 14, 2022, 6:38pm

@sgugger can you please help me in resolving this error?

kkumari06 · May 16, 2022, 5:03am

Hello @prashanth , you can try out reducing the batch_size or enable gradient-checkpointing or can do training in fp16 to save memory.

JLinton · May 17, 2022, 5:53pm

As @kkumari06 says, reduce batch size. I recommend restarting the kernel any time you get this error, to make sure you have a clean GPU memory; then cut the batch size in half. Repeat until it fits in GPU memory or until you hit batch size of 1… in which case, you’ll need to switch to a smaller pretrained model. (If training a model from scratch, you can instead reduce the size of your model, for example by reducing maximum input size or reducing number of layers.) Finally, you may want to bump up the gradient accumulation if your batch size is very small. For example, if you have a batch size of 4, gradient accumulation of 8 would give you an “effective” batch size of 32, which some research suggests is ideal… however, YMMV.

Raisa06 · December 10, 2023, 1:33pm

I’m hyper tuning bert-multilingual-uncased model for NER use case. I’m using AWS EC2 instance g4dn.metal which has 8 GPU. My training sample contact 110k samples. I tried the model training instance having 4 GPU I got CUDA out of memory error.

Tried reducing batch size, clear cache, setting up max_split_size (pytorch memory management) didn’t fix the error. So I started the model in bigger instance with 8 GPU still facing the same error.

Can someone please help me out in this?

@kkumari06 , @prashanth

OxxoCodes · December 21, 2023, 11:13pm

@Raisa06 Number of GPUs doesn’t matter unless you’re being very particular to ensure the model is being split among those GPUs. Instead of 1/8th the model being on each GPU, you likely have 8 full copies of the model. Aim for GPUs with more memory per card. Also, if you have already decreased batch size and gradient accumulation steps as much as possible, try using distilbert-base-multilingual-cased, which is a smaller version of the model you’re training. Also, with HF’s accelerate library you can enable deepspeed which is a free memory decrease, and you can also use it to train with mixed precision to further decrease memory usage.

Pekka10 · February 26, 2024, 10:25am

Hi @Raisa06
Getting the same issue were you able to resolve it or not and how you clear the cache for the instance as I am using sagemaker for fine tuning.

ppallavalli · March 16, 2024, 11:43pm

Is it possible to clear the GPU storage by any chance? Even when requests are not being made, the gpu storage does not get cleared

akshat-kumar-akight · April 4, 2024, 1:45pm

Remember to delete unneeded tensors and variables. You should also move the tensors and variables that you want to store to cpu and then call torch.cuda.empty_cache() to free up GPU space, after you’re done with each batch

codewithRiz · July 17, 2024, 2:57pm

by reducing the batch size my error got solved im using kaggle notebook with dual gpu.
note training proces will slow

devcnn5 · December 20, 2024, 7:10am

Try to set max_length=1024 or 512 in your tokenize function which you’re using to tokenize the dataset.

For example in my tokenize function i set it like so:

example[‘input_ids’] = tokenizer(prompt, padding=“max_length”, truncation=True, return_tensors=“pt”, max_length=1024).input_ids
example[‘labels’] = tokenizer(example[‘answer’], padding=“max_length”, truncation=True, return_tensors=“pt”, max_length=1024).input_ids

vinaysan · January 27, 2025, 10:00am

Thank you. This is especially important for models like deberta, which have no max_length.

Topic		Replies	Views
RuntimeError: CUDA out of memory. Tried to allocate 1.91 GiB (GPU 0; 15.78 GiB total capacity; 12.36 GiB already allocated; 302.75 MiB free; 14.16 GiB reserved in total by PyTorch) Beginners	2	1378	September 11, 2021
RuntimeError: CUDA out of memory even with simple inference Beginners	1	5409	January 16, 2022
Solving "CUDA out of memory" when fine-tuning GPT-2 🤗Transformers	0	1422	January 6, 2022
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.17 GiB total capacity; 10.62 GiB already allocated; 145.81 MiB free; 10.66 GiB reserved in total by PyTorch) Beginners	8	27530	December 10, 2023
CUDA memory suddenly run out of space when only used a quarter of memory Models	0	1143	January 7, 2023

Cuda out of memory error

Related topics