Error io.BufferReader

I have loaded an audio dataset from my local files in a certain folder and then want to get a list that contains all paths for my audio files with the following code

paths = []
for i in range(len(dataset['train'])):
paths.extend(dataset['train'][i]['audio']['path'])

I get the following Errors :RuntimeError: Error opening <_io.BufferedReader name=‘/home/kareem/Desktop/deep_learning/vision_projects/zikir/my_own_data_jetforms/_2/64721fd4b828e_168520085264721fd4bd6c7.wav’>: Format not recognised.

see this images ! this really strange why he can’t print the paths!


It looks like some samples can’t be read (maybe corrupted files ?) when you access them using your for loop.

You can disable Audio decoding to iterate on your dataset without reading the audio data and to avoid this error:

from datasets import Audio

dataset = dataset.cast_column("audio", Audio(decode=False))
for example in dataset:
    ...