Datasets map modifying audio array to list?

I’m trying to use my own data for fine-tuning a Wav2Vec2 model but every time I create my DatasetDict, it converts my audio array to a list? Am I doing something incorrectly? How can I preserve the audio as an array?

def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = sf.read(batch["file"])
    batch["audio"] = speech_array
    return batch

updated_dataset = my_dataset.map(speech_file_to_array_fn)
my_audio = updated_dataset['test'][0]
print(my_audio)
{'file': '/disks/data3/UASPEECH/control/CM04/CM04_B2_UW33_M3.wav', 'text': 'APPROACH', 'audio': [0.0001220703125, -0.00018310546875, 0.000152587890625, -0.00030517578125, 6.103515625e-05, 9.1552734375e-05,...}

print(type(my_audio['audio']))
<class 'list'>

Hi,

set format to NumPy to get an array as follows:

updated_dataset.set_format("numpy", columns=["audio"], output_all_columns=True)
updated_dataset["test"][0]
1 Like