I’m trying to use my own data for fine-tuning a Wav2Vec2 model but every time I create my DatasetDict, it converts my audio array to a list? Am I doing something incorrectly? How can I preserve the audio as an array?
def speech_file_to_array_fn(batch):
speech_array, sampling_rate = sf.read(batch["file"])
batch["audio"] = speech_array
return batch
updated_dataset = my_dataset.map(speech_file_to_array_fn)
my_audio = updated_dataset['test'][0]
print(my_audio)
{'file': '/disks/data3/UASPEECH/control/CM04/CM04_B2_UW33_M3.wav', 'text': 'APPROACH', 'audio': [0.0001220703125, -0.00018310546875, 0.000152587890625, -0.00030517578125, 6.103515625e-05, 9.1552734375e-05,...}
print(type(my_audio['audio']))
<class 'list'>