I’m trying to run simple example of wav2vec2 and getting error:
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2FeatureExtractor
import librosa
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-xlsr-53")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-large-xlsr-53")
SPEECH_FILE = "/home/heb.wav"
waveform, sample_rate = librosa.load(SPEECH_FILE, sr=16000)
features = feature_extractor(waveform, sampling_rate=sample_rate, return_tensors="pt")
with torch.no_grad():
logits = model(features.input_values).logits
# Decode the predicted transcription from the logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = feature_extractor.decode(predicted_ids[0])
print(transcription)
I’m getting error:
AttributeError: 'Wav2Vec2FeatureExtractor' object has no attribute 'decode'
How can I decode the wav2vec2 output ?
(I’m using transformers version: 4.11.3)