Natural Questions Dataset is not streamable

As this datasets needs 134GB of disk space to load I thought the streaming feature of load_dataset would come handy here but apparently streaming is not possible on Natural Questions. It was my assumption that all datasets were streamable on huggingface. Was my assumption incorrect?

NQ’s original data are in a format that is not easily streamable, as wikipedia.

Though we’re hosting processed versions of the dataset in Arrow format, it should be possible to stream from there, but this is not implemented at the moment

1 Like

could you please provide the link of processed dataset?

Here it is: