Seeing below error when I am trying to additional train XLM using transformers library.
RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.92 GiB total c remote: Compressing objects: 100% (1/1), done. │capacity; 10.83 GiB already allocated; 442.62 MiB free; 10.99 GiB reserved in total by Py remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0 │Torch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid Unpacking objects: 100% (3/3), done. │fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have set the environment variable as suggested in CUDA semantics — PyTorch 1.10.0 documentation Memory Management section
11/09/2021 01:21:12 PM Pytorch CUDA conf max_split_size_mb:40
(pair-nlp) [abpu9500@login12 continued-pretraining]$ pip freeze| grep "transformers" ~ │transformers==4.12.3 (pair-nlp) [abpu9500@login12 continued-pretraining]$ pip freeze| grep "torch" ~ │torch==1.10.0