PIL.UnidentifiedImageError: cannot identify image file

We have a dataset on our organization page - no matter how we initially build it, then configure it and load it for a fine-tuning training job using the Dataset library we always get this error.
It is configured with Parquet on HuggingFace and then is broken down into the arrow files as shards of the Parquet file.
All paths and manner of storage have failed in this same manner when trying to start the fine-tune.

Hi ! What’s your dataset structure ? How do you load it ?

Hi there, thanks for the response.

In general, the dataset is sound and the loading method is proper - this above error screenshot is from using a loading technique to experiment if we could get anything working at all. Here, we cloned the dataset directly into the environment from the Hub so we could get a local path to give to the training directory. This is a repeated error we have gotten no matter the local path that holds the arrow/parquet formatted files. Also, loading the dataset with the following has been successful each time:
dataset = load_dataset("dataset-path", split="train")

The main problem is as follows: after some investigation into the training script from the diffusers library we are using - I found that it does not support loading datasets from the Hub at all. Only individual image files placed in a local folder can be recognized for training.

Oh I see. It could be worth opening an issue in the diffusers repository to request support for parquet datasets or any dataset on the Hub

Sounds good and will do!