How to load this simple audio data set and use dataset.map without memory issues?

we should change from_dict() and make it have it’s own cache directory tbh, people shouldn’t be looking for this unexpected source of OOM. I added a note in the docs at More docs to from_dict to mention that the result lives in RAM by lhoestq · Pull Request #7316 · huggingface/datasets · GitHub

2 Likes