torch.cuda.OutOfMemoryError

oshani · July 5, 2023, 3:05am

Hi All, I keep keep getting this error while running transformers train.train():

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 39.50 GiB total capacity; 38.72 GiB already allocated; 225.12 MiB free; 38.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Following some advise online, I tried setting PYTORCH_CUDA_ALLOC_CONF to “garbage_collection_threshold:0.6,max_split_size_mb:128” and also adding

torch.cuda.empty_cache()

to my code but that doesn’t help.

So any ideas? My GPU is A100 with 40GB of memory and I use cuda-11.4 and torch-2.0.1

Thanks,

Oren

Topic		Replies	Views
Setting PyTorch CUDA memory configuration while using HF transformers 🤗Transformers	1	3225	November 23, 2022
Always getting RuntimeError: CUDA out of memory with Trainer 🤗Transformers	10	6940	April 4, 2024
Out of memory error Beginners	0	837	January 26, 2023
CUDA out of memory on multi-GPU 🤗Transformers	1	2670	March 6, 2024
Multi GPU Training with Trainer and TokenClassification Model 🤗Transformers	0	1525	July 21, 2023

torch.cuda.OutOfMemoryError

Related topics