Online/streaming speech recognition

Are there plans to implement online decoding for the speech recognition models such as wav2vec2 and XLSR? More specifically, to be able to receive audio in short chunks, and output partial transcripts as they become available.

Motivation

Many use cases are covered by the current wav2vec2 model in the library, involving batch recognition of pre-recorded text. However for an online application that wanted to continuously recognize speech on a live input stream, this may not be sufficient.

6 Likes

I would very much like to know whether this is possible too! Have you gotten any further on this, @arkadyark?

please check this one

1 Like