Wav2vec2 for long audiofiles


I’m trying to apply wave2vec2 models on long audiofiles (~1h) for speech to text.
However processing the entire audio file at once is not feasible because it requires more than 16GB. How can I import a sound file as audio stream into the wave2vec models?

1 Like

Here is one way to do this with librosa.stream:


This should actually now be extremely easy with the new chunking feature.

This blog post should help: Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

1 Like