Accessing to OSCAR data-set

Hi guys,
I wanted to check the oscar data set and create a subsample of it for the experiment. I followed the instruction in the documentation:

from datasets import load_dataset

dataset = load_dataset("oscar", "unshuffled_deduplicated_fa")

and it return

    train: Dataset({
        features: ['id', 'text'],
        num_rows: 8203495
    })
})

my question was how can I access the text data it self.
The directory that pops up when I loading data set is
/root/.cache/huggingface/datasets/oscar/unshuffled_deduplicated_fa/1.0.0/e4f06cecc7ae02f7adf85640b4019bf476d44453f251a1d84aebae28b0f8d51d

Thanks :slightly_smiling_face:

dataset['train']['text'] for instance

1 Like