MMS model on arabic audio

lilouuch · July 10, 2023, 10:02pm

I tried the MMS model.

from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch

model_id = "facebook/mms-1b-all"

processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
processor.tokenizer.set_target_lang("ara")
model.load_adapter("ara")
def process_audio(example):
    print(example['file_name'])
    inputs = processor(example['audio']['array'], sampling_rate=16_000, return_tensors="pt")
    example['input']=inputs
    
    return example
However, I got the following error,

ArrowInvalid: Could not convert {'input_values': tensor([[ 0.0291,  0.0056,  0.0116,  ...,  0.0146,  0.0194, 
-0.0042]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1]], dtype=torch.int32)} with type BatchFeature: did 
not recognize Python value type when inferring an Arrow data type

I'm confused since it worked for 998 audio samples, and when it arrived to 
999/2571 [00:30<00:27, 57.23ex/s], it stopped, I heard the audio, and there is nothing suspicious.

Topic	Replies	Views
Size mismatch for lm_head.weight/bias when loading state_dic for Wav2Vec2ForCTC on MMS french pipeline Beginners	263	June 8, 2023
How to use an unsupported Beam Search decoder in ASR Pipeline? 🤗Transformers	545	August 4, 2023
Wav2Vec2ForCTC not working for my own wav file 🤗Transformers	869	November 22, 2021
How to use the wav2vec2-large-TIMIT-IPA2 model? 🤗Transformers	282	June 4, 2023
ASR Model Tokenizer Won't Load 🤗Transformers	74	August 8, 2024

MMS model on arabic audio

Related topics