Setting PyTorch CUDA memory configuration while using HF transformers

Seeing below error when I am trying to additional train XLM using transformers library.

RuntimeError: CUDA out of memory. Tried to allocate 978.00 MiB (GPU 0; 11.92 GiB total c
remote: Compressing objects: 100% (1/1), done.                                           │capacity; 10.83 GiB already allocated; 442.62 MiB free; 10.99 GiB reserved in total by Py
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0                             │Torch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid
Unpacking objects: 100% (3/3), done.                                                     │fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have set the environment variable as suggested in CUDA semantics — PyTorch 1.10.0 documentation Memory Management section

11/09/2021 01:21:12 PM Pytorch CUDA conf max_split_size_mb:40

(pair-nlp) [abpu9500@login12 continued-pretraining]$ pip freeze| grep "transformers"
~                                                                                        │transformers==4.12.3

(pair-nlp) [abpu9500@login12 continued-pretraining]$ pip freeze| grep "torch"
~                                                                                        │torch==1.10.0