How much memory to fine tune wav2vec2?

I’m trying to replicate this blog post on fine tuning XLSR (Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers ) and I’m running into CUDA out of memory issues. I’m training on a machine with multiple nvidia titan V (12 gb memory) and even when I:

  1. reduce batch size to 1
  2. remove all clips with > 5 seconds (even reduced this down to 2 seconds)
  3. use adafactor instead of adamw (as suggested here: Performance and Scalability: How To Fit a Bigger Model and Train It Faster)

I still run out of memory. I’m not sure if this suggests there is a bug in my code somewhere or I simply don’t have enough memory to do this - any advice would be appreciated!

Is it able to start training before running into Cuda issues? Have you tried model sharding if you only have access to 12gb gpus? You could try using cloud resources, they have 15gb gpus and 32gb gpus.

1 Like

I think it doesn’t get to the stage where it says x/y epochs, etc, but the cell with trainer.train() does at least some computation before running out of memory.

I haven’t tried model sharding, thanks for suggesting that - I’ll look into it!

If that doesn’t work, maybe I’ll look into cloud options.

1 Like