we should change from_dict() and make it have it’s own cache directory tbh, people shouldn’t be looking for this unexpected source of OOM. I added a note in the docs at More docs to from_dict to mention that the result lives in RAM by lhoestq · Pull Request #7316 · huggingface/datasets · GitHub
2 Likes