I want to load my HF dataset using pyarrow
. What should I give for schema in the below code:
import pyarrow as pa
import pyarrow.dataset as ds
data_path = "/some/path/to/huggingface/dataset"
schema = pa.schema([])
dataset = ds.dataset(data_path, format="arrow", schema=schema)
from the huggingface dataset directory “dataset_info.json” file, I have the following:
"features": {
"index": {
"dtype": "int32",
"_type": "Value"
},
"example": {
"shape": [
null,
2
],
"dtype": "float32",
"_type": "Array2D"
},
"label": {
"shape": [
null,
3
],
"dtype": "float32",
"_type": "Array2D"
},
"coords_label": {
"shape": [
null,
3
],
"dtype": "float32",
"_type": "Array2D"
},
"coords_num": {
"feature": {
"dtype": "int32",
"_type": "Value"
},
"_type": "Sequence"
},
"has_noise": {
"dtype": "bool",
"_type": "Value"
}
},