Constantly running out of memory fine-tuning Wav2Vec2

I am currently trying to fine-tune the facebook/wav2vec-base model but I am constantly running into memory issues after a few epochs:

  File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/transformers/models/wav2vec2/", line 631, in forward
    hidden_states, attn_weights, _ = self.attention(
  File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/torch/nn/modules/", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/transformers/models/wav2vec2/", line 553, in forward
    attn_weights = nn.functional.softmax(attn_weights, dim=-1)
  File "/home/sfalk/miniconda3/envs/speech/lib/python3.8/site-packages/torch/nn/", line 1680, in softmax
    ret = input.softmax(dim)
RuntimeError: CUDA out of memory. Tried to allocate 4.24 GiB (GPU 0; 10.92 GiB total capacity; 5.63 GiB already allocated; 3.44 GiB free; 6.67 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am running this training on 4x GeForce 1080 GTX (11GB VRAM each) using only a batch-size of 1 and no gradient-accumulation whatsoever.

Can I do anything about this? Should I use a smaller model to begin with? If so… what options do I have?

1 Like

Running into the same issue, did you find out what caused this?