File size/speech length limit for Wave2Vec2?

Hi there. I’ve been trying out Hugging Face’s implementation of Wave2Vec2 for transcribing on Colab Pro, and got pretty good results from short speeches under 80 seconds. Anything beyond that just crashes the notebook, even when I set it to High RAM, or compress the audio file drastically.

Is there a practical limit to the length of the audio clip that can/should be run on HF-Wav2Vec2? I tried looking for documentation on this, but might have missed it.

Appreciate any pointers on this.


Answering my own question in case anyone stumbles on this and wants a quick solution: seems like it’s a memory issue. I cobbled together a simple if clumsy way to transcribe the split-up clips one at a time. See attached screen-grab or check out the notebooks in my repo for this project: GitHub - chuachinhon/wav2vec2_transformers: Transcribing audio files using Hugging Face's implementation of Wav2Vec2


@lysandre has a far better solution to this issue. See this Github post can't allocate memory error with wav2vec2 · Issue #10366 · huggingface/transformers · GitHub

Code screen grab below


Thank you for sharing and happy the snippet helps you!