Problem with Dataset Preview with audio files

hi @gcjavi ! the recommended approach currently is to use no-code dataset configuration without custom dataset scripts, in your case you can use AudioFolder structure for your repository to make the viewer work correctly. You need to structure your data according to the documentation, note that file with transcriptions must be called metadata.csv / metadata.jsonl and column names also should be strictly file_name and transcription. and you should delete python script, and update/delete README’s dataset_info field too to avoid mismatch between features and config names.

This should work, until you have a really huge dataset. In the latter case I recommend to first create a dataset locally with your custom code in python (you might find Dataset.from_generator() useful) and then use .push_to_hub() to push the data to a Hub repo in parquet format.

Let me know if that worked :slight_smile: