Hi there. I’ve been trying out Hugging Face’s implementation of Wave2Vec2 for transcribing on Colab Pro, and got pretty good results from short speeches under 80 seconds. Anything beyond that just crashes the notebook, even when I set it to High RAM, or compress the audio file drastically.
Is there a practical limit to the length of the audio clip that can/should be run on HF-Wav2Vec2? I tried looking for documentation on this, but might have missed it.