Hello, did someone already try to get the timestamps of the words while decoding the audio with wav2vec2 model?
Thank you!
Hello, did someone already try to get the timestamps of the words while decoding the audio with wav2vec2 model?
Thank you!
I have created some hacky code to do so and have posted it in this huggingface github issue. You can find it here: Getting time offsets of beginning and end of each word in Wav2Vec2 · Issue #11307 · huggingface/transformers · GitHub
Hi, I have successfully used this repository: GitHub - lumaku/ctc-segmentation: Segment an audio file and obtain utterance alignments. (Python package) to obtain the word level timestamps from wav2vec.