I’m using the following code to load my local audio files as a dataset and it’s working fine:
from datasets import load_dataset, Dataset, Audio
dataset = Dataset.from_dict({"audio": ['test.mp3']}).cast_column("audio", Audio())
And I’m using several models like speech detection, silence detection to find where I want the time inside the audio file.
Based on that, I’m using ffmpeg to cut the file into hundreds and feed those hundreds of files into different models, which is working fine.
Of course, I think this is inefficient.
So I tried this, but it didn’t work:
dataset['audio'][0]['array'] = dataset['audio'][0]['array'][:1000000]
What should I do?