Loading dataset with streaming model

I am trying to load dataset in streaming model.
The current datasets version I am using is 1.8.
But it is producing the following error.


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-518060a18801> in <module>()
----> 1 dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', streaming=True)

3 frames
/usr/local/lib/python3.7/dist-packages/datasets/builder.py in _create_builder_config(self, name, custom_features, **config_kwargs)
    339                 if value is not None:
    340                     if not hasattr(builder_config, key):
--> 341                         raise ValueError(f"BuilderConfig {builder_config} doesn't have a '{key}' key.")
    342                     setattr(builder_config, key, value)
    343 

ValueError: BuilderConfig OscarConfig(name='unshuffled_deduplicated_en', version=1.0.0, data_dir=None, data_files=None, description='Unshuffled and deduplicated, English OSCAR dataset') doesn't have a 'streaming' key.

Quick notebook link.

@valhalla Can you take a look?

Hi, I ran into the same issue using Colab. So I ended up cloning the datasets Github repo entirely and installing it from there. Here’s what I ran:

!git clone https://github.com/huggingface/datasets.git && cd datasets && pip install -q -e ".[streaming]"

Afterwards, I had to restart the runtime to make things work. Note that it installs the 1.8.1.dev0 version of datasets. Notebook link here.

Cheers!

2 Likes

Ahh!!! Got it. It was not ported earlier.
We just need to compile from the source.
@w11wo

1 Like

Thank you for your reply,I met an error, like this

ValueError: BuilderConfig ReazonSpeechConfig(name='all', version=0.0.0, data_dir=None, data_files=None, description=None) doesn't have a 'trust_remote_code' key.

I solved by

git clone https://github.com/huggingface/datasets.git && cd datasets && pip install -q -e ".[trust_remote_code]