So I have these audio files and their corresponding csv file and I would like to make it like the moxilla common voice dataset that looks like this when you read it in python:
{
'client_id': 'd59478fbc1ee646a28a3c652a119379939123784d99131b865a89f8b21c81f69276c48bd574b81267d9d1a77b83b43e6d475a6cfc79c232ddbca946ae9c7afc5',
'path': 'et/clips/common_voice_et_18318995.mp3',
'audio': {
'path': 'et/clips/common_voice_et_18318995.mp3',
'array': array([-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449], dtype=float32),
'sampling_rate': 48000
},
'sentence': 'Tasub kokku saada inimestega, keda tunned juba ammust ajast saati.',
'up_votes': 2,
'down_votes': 0,
'age': 'twenties',
'gender': 'male',
'accent': '',
'locale': 'et',
'segment': ''
}
My questions is: how do I create the audio column and how was the array
feature generated?