Generating Croissant Metadata for Custom Image Dataset

Hi John, I’m now trying to load my dataset and use push_to_hub to push it to a new dataset. This is the script I’m using:

from datasets import load_dataset

dataset = load_dataset(
    # path="eztao/RefRef_test",
    path="yinyue27/RefRef",
    name="single-convex",
    scene="ball",
    split="textured_sphere_scene",
    trust_remote_code=True
)

print(dataset)  # Should show the dataset structure

dataset.push_to_hub("eztao/RefRef_parquet")

But I’m getting this error:

Dataset({
   features: ['image', 'depth', 'mask', 'transform_matrix', 'rotation'],
   num_rows: 300
})
Traceback (most recent call last):
 File "/home/u7543832/PhD/DataBuilder.py", line 14, in <module>
   dataset.push_to_hub("eztao/RefRef_parquet")
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 5549, in push_to_hub
   additions, uploaded_size, dataset_nbytes = self._push_parquet_shards_to_hub(
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 5349, in _push_parquet_shards_to_hub
   dataset_nbytes = self._estimate_nbytes()
                    ^^^^^^^^^^^^^^^^^^^^^^^
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 5177, in _estimate_nbytes
   table_visitor(table, extra_nbytes_visitor)
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/table.py", line 2378, in table_visitor
   _visit(table[name], feature)
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/table.py", line 2358, in _visit
   _visit(chunk, feature)
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/table.py", line 2362, in _visit
   function(array, feature)
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/arrow_dataset.py", line 5172, in extra_nbytes_visitor
   size = xgetsize(x["path"])
          ^^^^^^^^^^^^^^^^^^^
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/datasets/utils/file_utils.py", line 769, in xgetsize
   size = fs.size(main_hop)
          ^^^^^^^^^^^^^^^^^
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/fsspec/spec.py", line 696, in size
   return self.info(path).get("size", None)
          ^^^^^^^^^^^^^^^
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/huggingface_hub/hf_file_system.py", line 727, in info
   _raise_file_not_found(path, None)
 File "/home/u7543832/anaconda3/lib/python3.12/site-packages/huggingface_hub/hf_file_system.py", line 1136, in _raise_file_not_found
   raise FileNotFoundError(msg) from err
FileNotFoundError: datasets/yinyue27/RefRef@main/image_data/textured_sphere_scene/single-convex/ball_sphere/./train/r_0.png

Seems that I can load the dataset (I also plotted out the image to make sure of it), and the file path is correct, but I’m constantly getting this error. Could you help me with it? Thanks!

1 Like