Loading specific features in a JSON dataset

Hi. Is there a way to load specific fields in a dataset stored in JSON or JSON lines format?

For example, if a file contains the following lines (extracted from here):


How can I load only the id, father, and mother features (leaving out the `children feature)?


We use PyArrow to read JSON files into Arrow tables, but according to the documentation it doesn’t seem to be possible to load only a subset of fields: pyarrow.json.read_json — Apache Arrow v14.0.1

Though it’s possible to load a subset of fields if the data is in Parquet, since it’s a columnar format. You just need to pass columns=... (see the ParquetConfig parameters)

1 Like