Error when setting up the Dataset Viewer - StreamingRowsError

j-krzywdziak · August 18, 2023, 8:14pm

Hi! I have a problem with uploading MOCKS dataset to enable our prepared custom splits. I’ve followed guides and tutorials to do it in proper way but each time I got the same error. I want the viewer to show audio id, audio and transcription, and when I want to get the transcription from the tsv file (transcriptions are in second column) and use proper index (row[1]) - I got IndexError: list out of range. Could you please help me figure this out? I’ve tried to read the tsv files without download_and_extract function and the result is the same. Thanks in advance!

mariosasko · August 21, 2023, 5:05pm

I opened a PR that fixes the dataset script and makes it streamable here: voiceintelligenceresearch/MOCKS · Fix dataset script

j-krzywdziak · August 21, 2023, 6:27pm

Yes! Great thank you a lot! Do you know maybe why the all and es.MCV option is not available

mariosasko · August 21, 2023, 7:18pm

Please merge this PR to fix the es.MCV config.

The all config fails to stream because - is not allowed as a character in a split name. I’ll open a PR in the datasets lib to remove this limitation, but you’ll have to wait until the next datasets release for the fix to be visible in the viewer.

j-krzywdziak · August 21, 2023, 7:40pm

ok sure, thank you again!

Topic		Replies	Views
Problem with Dataset Preview with audio files 🤗Datasets	7	1246	April 17, 2025
Timit_asr dataset issue 🤗Datasets	1	299	July 20, 2021
Loading dataset with streaming model Beginners	4	1019	March 11, 2024
Audio files view error 🤗Datasets	7	932	March 27, 2023
Dataset Viewer issue: RowsPostProcessingError 🤗Datasets	4	81	November 18, 2024

Error when setting up the Dataset Viewer - StreamingRowsError

Related topics