[Question] Wav2vec2 word times

Hello, did someone already try to get the timestamps of the words while decoding the audio with wav2vec2 model?

Thank you!


I have created some hacky code to do so and have posted it in this huggingface github issue. You can find it here: Getting time offsets of beginning and end of each word in Wav2Vec2 · Issue #11307 · huggingface/transformers · GitHub

Hi, I have successfully used this repository: GitHub - lumaku/ctc-segmentation: Segment an audio file and obtain utterance alignments. (Python package) to obtain the word level timestamps from wav2vec.

1 Like