Setting PyTorch CUDA memory configuration while using HF transformers

Abhishek-P · November 9, 2021, 8:33pm

Seeing below error when I am trying to additional train XLM using transformers library.

RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.92 GiB total c
remote: Compressing objects: 100% (1/1), done.                                           │capacity; 10.83 GiB already allocated; 442.62 MiB free; 10.99 GiB reserved in total by Py
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0                             │Torch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid
Unpacking objects: 100% (3/3), done.                                                     │fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have set the environment variable as suggested in CUDA semantics — PyTorch 2.1 documentation Memory Management section

11/09/2021 01:21:12 PM Pytorch CUDA conf max_split_size_mb:40

(pair-nlp) [abpu9500@login12 continued-pretraining]$ pip freeze| grep "transformers"
~                                                                                        │transformers==4.12.3

(pair-nlp) [abpu9500@login12 continued-pretraining]$ pip freeze| grep "torch"
~                                                                                        │torch==1.10.0

ThrinathMphasis · November 23, 2022, 10:46am

It still shows the same error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.50 GiB (GPU 0; 14.56 GiB total capacity; 11.18 GiB already allocated; 1.34 GiB free; 11.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

env:

environ({‘LS_COLORS’: ‘rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.Z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:’, ‘LESSCLOSE’: ‘/usr/bin/lesspipe %s %s’, ‘LANG’: ‘C.UTF-8’, ‘SUDO_GID’: ‘1002’, ‘OLDPWD’: ‘/home/user’, ‘USERNAME’: ‘root’, ‘SUDO_COMMAND’: ‘/bin/su’, ‘VIRTUAL_ENV’: ‘/home/user/project’, ‘PYTORCH_CUDA_ALLOC_CONF’: ‘max_split_size_mb:40’, ‘USER’: ‘root’, ‘PWD’: ‘/home/user/project’, ‘HOME’: ‘/root’, ‘SUDO_USER’: ‘user’, ‘SUDO_UID’: ‘1001’, ‘MAIL’: ‘/var/mail/root’, ‘SHELL’: ‘/bin/bash’, ‘TERM’: ‘xterm-256color’, ‘SHLVL’: ‘1’, ‘LOGNAME’: ‘root’, ‘PATH’: ‘/home/user/proj/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin’, ‘PS1’: '(project) \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\u@\h:\w\$ ', ‘LESSOPEN’: ‘| /usr/bin/lesspipe %s’, ‘_’: ‘/home/user/project/bin/accelerate’, ‘USE_CPU’: ‘False’, ‘USE_MPS_DEVICE’: ‘False’, ‘MIXED_PRECISION’: ‘no’, ‘OMP_NUM_THREADS’: ‘1’})

Topic		Replies	Views
torch.cuda.OutOfMemoryError 🤗Transformers	0	2069	July 5, 2023
Solving "CUDA out of memory" when fine-tuning GPT-2 🤗Transformers	0	1424	January 6, 2022
Always getting RuntimeError: CUDA out of memory with Trainer 🤗Transformers	10	6985	April 4, 2024
Run_mlm.py cuda error memory after resuming a training 🤗Transformers	4	2924	April 21, 2021
CUDA out of memory on multi-GPU 🤗Transformers	1	2683	March 6, 2024

Setting PyTorch CUDA memory configuration while using HF transformers

Related topics