Loading dataset with streaming model

I am trying to load dataset in streaming model.
The current datasets version I am using is 1.8.
But it is producing the following error.

ValueError                                Traceback (most recent call last)
<ipython-input-3-518060a18801> in <module>()
----> 1 dataset = load_dataset('oscar', "unshuffled_deduplicated_en", split='train', streaming=True)

3 frames
/usr/local/lib/python3.7/dist-packages/datasets/builder.py in _create_builder_config(self, name, custom_features, **config_kwargs)
    339                 if value is not None:
    340                     if not hasattr(builder_config, key):
--> 341                         raise ValueError(f"BuilderConfig {builder_config} doesn't have a '{key}' key.")
    342                     setattr(builder_config, key, value)

ValueError: BuilderConfig OscarConfig(name='unshuffled_deduplicated_en', version=1.0.0, data_dir=None, data_files=None, description='Unshuffled and deduplicated, English OSCAR dataset') doesn't have a 'streaming' key.

Quick notebook link.

@valhalla Can you take a look?

Hi, I ran into the same issue using Colab. So I ended up cloning the datasets Github repo entirely and installing it from there. Here’s what I ran:

!git clone https://github.com/huggingface/datasets.git && cd datasets && pip install -q -e ".[streaming]"

Afterwards, I had to restart the runtime to make things work. Note that it installs the 1.8.1.dev0 version of datasets. Notebook link here.



Ahh!!! Got it. It was not ported earlier.
We just need to compile from the source.

Thank you for your reply,I met an error, like this

ValueError: BuilderConfig ReazonSpeechConfig(name='all', version=0.0.0, data_dir=None, data_files=None, description=None) doesn't have a 'trust_remote_code' key.

I solved by

git clone https://github.com/huggingface/datasets.git && cd datasets && pip install -q -e ".[trust_remote_code]