Cuda out of memory issue training whisper model on single GPU

Hello Hugging face community!!

I hope whoever reads this is having a great day! :slight_smile:

So I’m working on the project of fine tuning a whisper model… Got it going with the small model without much issue, Got it working with medium on a smaller dataset
But training with this bigger dataset I have been working on, is not going well with the medium model. I’m trying to train with a batch size of 1, and gradient_accumulation_steps set way too high, even tried 32/64/96. Still met with the same out of memory issue. I’m trying to run this on 1, 2080TI with 12GB of Vram. So I’m wondering Anything I’m not thinking of to try? am I just going to have to leverage another tower with a second GPU(and integrate deep speed), or one with more VRAM?

Genuinely appreciate anyone time & input!! Thanks for reading & hope everyone has a great rest of the day! :smiley: