It looks like there is an error on my end between uploading data via the GUI of the website and through Python. The parquet file for images doesn’t have the correct headers when using Python. GUI works and I’m able to load the files after creating the DataSet. It does look like there is some type of scripting when uploading through the GUI that is allowing that to work. However that’s hacked together, hasn’t caught up with the Python implementation. If this is a Hugging Face Issue, that’d be good to know otherwise just something to be aware of. If anybody might know anything that could help fix this locally that’d be Great! Thanks for what you guys are doing!
Hi. Do you have a repository that we could look at?
Hi! Can you share the code that shows the issue with Parquet? I suspect that the image column is not of type
datasets.Image before conversion to Parquet.
For sure, I was just working from the excellent DreamBooth Hackathon Notebook
from datasets import load_dataset, load_from_disk dataset = load_dataset("imagefolder", data_dir="/pics", description="Sunshine" ) # Remove the dummy label column dataset = dataset.remove_columns("label") # Push to Hub dataset.push_to_hub("Sunshine-the-Chicken")
It was just an unlabeled dataset of pics so only one column. Just a weird error about not being able to uncompress the parquet file when using the dataset after uploading it.