Is it possible to reuse only part of an already loaded audio dataset?

jeemin1 · June 14, 2024, 4:51am

I’m using the following code to load my local audio files as a dataset and it’s working fine:

from datasets import load_dataset, Dataset, Audio

dataset = Dataset.from_dict({"audio": ['test.mp3']}).cast_column("audio", Audio())

And I’m using several models like speech detection, silence detection to find where I want the time inside the audio file.

Based on that, I’m using ffmpeg to cut the file into hundreds and feed those hundreds of files into different models, which is working fine.

Of course, I think this is inefficient.

So I tried this, but it didn’t work:

dataset['audio'][0]['array'] = dataset['audio'][0]['array'][:1000000]

What should I do?

Topic		Replies	Views
Question about streaming 🤗Datasets	3	572	April 25, 2023
Loading just part of dataset 🤗Datasets	4	4682	February 25, 2025
How to use load_dataset to load my own local dataset? 🤗Datasets	1	898	May 24, 2023
How to load this simple audio data set and use dataset.map without memory issues? 🤗Datasets	12	4228	December 10, 2024
How to create an audio dataset from local files already split into train and test without losing labels Beginners	2	401	March 17, 2024