Create the Moxilla Common Voice Data

So I have these audio files and their corresponding csv file and I would like to make it like the moxilla common voice dataset that looks like this when you read it in python:

{
  'client_id': 'd59478fbc1ee646a28a3c652a119379939123784d99131b865a89f8b21c81f69276c48bd574b81267d9d1a77b83b43e6d475a6cfc79c232ddbca946ae9c7afc5', 
  'path': 'et/clips/common_voice_et_18318995.mp3', 
  'audio': {
    'path': 'et/clips/common_voice_et_18318995.mp3', 
    'array': array([-0.00048828, -0.00018311, -0.00137329, ...,  0.00079346, 0.00091553,  0.00085449], dtype=float32), 
    'sampling_rate': 48000
  }, 
  'sentence': 'Tasub kokku saada inimestega, keda tunned juba ammust ajast saati.', 
  'up_votes': 2, 
  'down_votes': 0, 
  'age': 'twenties', 
  'gender': 'male', 
  'accent': '', 
  'locale': 'et', 
  'segment': ''
}

My questions is: how do I create the audio column and how was the array feature generated?

Hi @Owos! To convert audio files to arrays datasets has Audio feature that decodes audio on the fly.

I’m not sure I understand your question but if you want to create your custom audio dataset from your files similar to CommonVoice, you can check out our guide about audio datasets and other docs in Audio section. Feel free to ask any more questions if it’s not clear enough or open an issue if you think that something should be changed in the docs. :slight_smile:

2 Likes

Thank you so much @polinaeterna , I’ve been able to figure it out using the Audio data loader package provided by hugging face :hugs:!

1 Like