I tried the MMS model.
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch
model_id = "facebook/mms-1b-all"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
processor.tokenizer.set_target_lang("ara")
model.load_adapter("ara")
def process_audio(example):
print(example['file_name'])
inputs = processor(example['audio']['array'], sampling_rate=16_000, return_tensors="pt")
example['input']=inputs
return example
However, I got the following error,
ArrowInvalid: Could not convert {'input_values': tensor([[ 0.0291, 0.0056, 0.0116, ..., 0.0146, 0.0194,
-0.0042]]), 'attention_mask': tensor([[1, 1, 1, ..., 1, 1, 1]], dtype=torch.int32)} with type BatchFeature: did
not recognize Python value type when inferring an Arrow data type
I'm confused since it worked for 998 audio samples, and when it arrived to
999/2571 [00:30<00:27, 57.23ex/s], it stopped, I heard the audio, and there is nothing suspicious.