How to get timestamps for each word in a transcription

anlutfi · May 5, 2021, 4:20am

Hello,

I’m running a very simple code to get the transcription from an audio file. Is there a way of getting each word timestamp within the original audio?

Here is the code I’m running, thank you so much for your time and help

from transformers import Wav2Vec2Tokenizer, Wav2Vec2ForCTC
import librosa as lb
import torch

def extract(fpath):
    # Initialize the tokenizer
    tokenizer = Wav2Vec2Tokenizer.from_pretrained("KBLab/wav2vec2-large-xlsr-53-swedish")

    # Initialize the model
    model = Wav2Vec2ForCTC.from_pretrained("KBLab/wav2vec2-large-xlsr-53-swedish")

    # Read the sound file
    waveform, rate = lb.load(fpath, sr = 16000)

    # Tokenize the waveform
    input_values = tokenizer(waveform, return_tensors='pt', padding=True).input_values

    # Retrieve logits from the model
    with torch.no_grad():
        logits = model(input_values).logits

    # Take argmax value and decode into transcription
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = tokenizer.batch_decode(predicted_ids)

    # Print the output
    return transcription

kbtruthriver · February 16, 2022, 2:10pm

Were you able to find the solution for this?

yashism · September 5, 2023, 10:09pm

Whisper X does the work that you want and it worked for me. So,

Topic		Replies	Views
[Question] Wav2vec2 word times 🤗Transformers	2	2945	June 24, 2021
Whisper pipeline return_timestamps error Beginners	0	1518	March 4, 2023
Wav2Vec2ForCTC not working for my own wav file 🤗Transformers	0	868	November 22, 2021
Confidence Scores / Self-Training for Wav2Vec2 / CTC models Research	1	3710	April 21, 2022
How to decode wav2vec2 output with beam search? Beginners	0	573	March 6, 2023

How to get timestamps for each word in a transcription

Related topics