Problem with Dataset Preview with audio files

polinaeterna · March 7, 2024, 11:15am

hi @gcjavi ! the recommended approach currently is to use no-code dataset configuration without custom dataset scripts, in your case you can use AudioFolder structure for your repository to make the viewer work correctly. You need to structure your data according to the documentation, note that file with transcriptions must be called metadata.csv / metadata.jsonl and column names also should be strictly file_name and transcription. and you should delete python script, and update/delete README’s dataset_info field too to avoid mismatch between features and config names.

This should work, until you have a really huge dataset. In the latter case I recommend to first create a dataset locally with your custom code in python (you might find Dataset.from_generator() useful) and then use .push_to_hub() to push the data to a Hub repo in parquet format.

Let me know if that worked

Topic		Replies	Views
Audio files view error 🤗Datasets	7	935	March 27, 2023
Error when setting up the Dataset Viewer - StreamingRowsError 🤗Datasets	4	345	August 21, 2023
Steps to have audio-playing UI with dataset viewer Beginners	0	75	June 19, 2024
Dataset preview rendering with NULL 🤗Datasets	0	51	January 13, 2025
Audio dataset without uploading the data to the hub 🤗Datasets	6	1971	March 20, 2023

Problem with Dataset Preview with audio files

Related topics