Hi,
I am trying to create a indic version of common voice. I am new to datasets, so I am not sure how to proceed with this.
Can anyone please help me to decide the structure and format of files. I have dataset in 6 languages. For every language I have a train, dev, test split.
What I am thinking is this:
--hi
-- --train
-- --dev
-- --test
--mr
-- --train
-- --dev
-- --test
I am planning to upload zip files for all the train, dev and test sets, will zips be supported or I have to upload individual files?