Custom dataset and cast_column

Hi, I have my own dataset. The dataset has .wav files and a csv file that contains two columns audio and text. Now I use datasets to read the corpus.

my_dataset = load_dataset('en-dataset')

output is as follows:

DatasetDict({
    train: Dataset({
        features: ['audio', 'text'],
        num_rows: 4
    })
})

Now when I use cast_column as follows:

dataset = my_dataset.cast_column("audio", datasets.features.Audio(sampling_rate=16_000))

I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/content/asd_1.wav'

Any thoughts how to resolve it?

Does the “audio” column store the path to the audio file ? Can you make sure the path is a valid relative path from your working directory, or an absolute path ?

1 Like