Hi guys,
I wanted to check the oscar data set and create a subsample of it for the experiment. I followed the instruction in the documentation:
from datasets import load_dataset
dataset = load_dataset("oscar", "unshuffled_deduplicated_fa")
and it return
train: Dataset({
features: ['id', 'text'],
num_rows: 8203495
})
})
my question was how can I access the text data it self.
The directory that pops up when I loading data set is
/root/.cache/huggingface/datasets/oscar/unshuffled_deduplicated_fa/1.0.0/e4f06cecc7ae02f7adf85640b4019bf476d44453f251a1d84aebae28b0f8d51d
Thanks