I am currently trying to fine-tune the facebook/wav2vec-base
model but I am constantly running into memory issues after a few epochs:
File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 631, in forward
hidden_states, attn_weights, _ = self.attention(
File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 553, in forward
attn_weights = nn.functional.softmax(attn_weights, dim=-1)
File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/torch/nn/functional.py", line 1680, in softmax
ret = input.softmax(dim)
RuntimeError: CUDA out of memory. Tried to allocate 4.24 GiB (GPU 0; 10.92 GiB total capacity; 5.63 GiB already allocated; 3.44 GiB free; 6.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I am running this training on 4x GeForce 1080 GTX (11GB VRAM each) using only a batch-size of 1 and no gradient-accumulation whatsoever.
Can I do anything about this? Should I use a smaller model to begin with? If so… what options do I have?