Hello friends! I’ve been having an error on a previously working notebook when streaming a dataset.
- I started with the Datasets overview colab
- I built a data analysis of the LAION Aesthetics dataset which is huge, so I had it streaming
- It was working on Monday (Sep 5), but when I went to run it again the next day I was having the following error when trying to iterate through it
- Everything works fine when it’s not streamed
from datasets import load_dataset
dataset = load_dataset("ChristophSchuhmann/improved_aesthetics_5plus", split="train", streaming=True)
print(next(iter(dataset)))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pyarrow/io.pxi in pyarrow.lib.get_native_file()
11 frames
TypeError: not a path-like object
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pyarrow/io.pxi in pyarrow.lib.PythonFile.__cinit__()
TypeError: readable file expected
Here is a Colab replicating the error. Does anyone know what might be going wrong here?