hi @sebchw! I’m not sure what’s causing the error and memory overload (do you have any ideas, @lhoestq ?) but note that when you provide arrays in audio feature, what it does under the hood is actually writing arrays to bytes and storing audios as bytes. And then after you load the dataset and access samples, audios are decoded on the fly with the datasets
library standard decoding. We should clarify this in the docs I think.
So if you want to apply your custom decoding with stempeg
, you can set decode=False
to audio features (in _info
) and provide only paths to local audio files in generate_examples
, smth like:
def _generate_examples(self, audio_path):
id_ = 0
names = ["mixture", "drums", "bass", "other", "vocals"]
for stems_path in Path(audio_path).iterdir():
yield id_, {
"name": stems_path.stem,
**{name: {"path": stems_path} for name in names}
}
id_ += 1
and then use your custom decoding function on the loaded dataset.