How to steaming .hf dataset

I’m trying to load .hf datasets using stream
The dataset is tuan124816/newcs2_data

dataset = load_dataset("tuan124816/newcs2_data",
                       streaming=True)
hf_dataset = dataset['test']

output

IterableDataset({
    features: Unknown,
    n_shards: 600
})

when print out the first element:

print(next(iter(hf_dataset)))

output

{'_data_files': [{'filename': 'data-00000-of-00001.arrow'}], '_fingerprint': '905978a8bab44335', '_format_columns': ['observations', 'actions', 'rewards'], '_format_kwargs': {}, '_format_type': None, '_output_all_columns': False, '_split': None}

Is this the right way to load this kind of dataset?
How can I read the data and know what inside [‘observations’, ‘actions’, ‘rewards’]?

1 Like

This?

From what I see , the stream only load the state.json file from each .hf folder

state.json:

{
  "_data_files": [
    {
      "filename": "data-00000-of-00001.arrow"
    }
  ],
  "_fingerprint": "905978a8bab44335",
  "_format_columns": [
    "observations",
    "actions",
    "rewards"
  ],
  "_format_kwargs": {},
  "_format_type": null,
  "_output_all_columns": false,
  "_split": null
}

From my experience other dataset like imdb when stream alway have an clear output of dictionary with text and label. I’m confuse why the data don’t load the arrow file.
Am I doing something wrong?