I uploaded a dataset with image and text in a parquet format. The image column is in the format of a dictionary with “bytes” and “path” as required. But the dataset preview cannot recognize the image and display the dictionary instead. How to solve the problem so that images are displayed in the preview section?
can you share the repository?
Thanks for replying me. The dataset is this: LouisChen15/ConstructionSite · Datasets at Hugging Face. I removed the original dataset and only left ten images as a demo. But they have the same issue as described.
cc @lhoestq? Images in a parquet file.
Hi ! Pandas sets the type of the “image” column to be a struct of bytes and path, since it doesn’t have an image type (yet ?)
Ideally it would be great to have a way to define types metadata, maybe we could have something like this in the future ?
df.image.attrs = {"dtype": "image"}
df.to_parquet("hf://datasets/LouisChen15/ConstructionSite/test_split_demo.parquet")
Anyway, right now if you want to set the type to image, you can define the types in the README.md in YAML:
dataset_info:
features:
- name: image
dtype: image
- name: image_id
dtype: string
- name: image_caption
dtype: string
- name: illumination
dtype: string
- name: camera_distance
dtype: string
- name: view
dtype: string
- name: quality_of_info
dtype: string
- name: rule_1_violation
struct:
- name: bounding_box
sequence:
sequence: float64
- name: reason
dtype: string
- name: rule_2_violation
dtype: 'null'
- name: rule_3_violation
struct:
- name: bounding_box
sequence:
sequence: float64
- name: reason
dtype: string
- name: rule_4_violation
dtype: 'null'
- name: excavator
sequence:
sequence: float64
- name: rebar
sequence:
sequence: float64
- name: worker_with_white_hard_hat
sequence: 'null'
Thank you, it finally works, the metadata is exactly what I need. However, I did try using metadata before to solve the problem, but I only include the image part:
-name: image
dtype: image
But it did not work. Does it mean I have to correctly specify the data type of all the “names” so that the system can process it?
Yea all the columns/types are required in the YAML