Data format for text-to-image

jbmaxwell · January 13, 2023, 8:51pm

Hi All,

I’m trying to load a custom dataset for text-to-image fine-tuning but I’m not sure how the data needs to be formatted. Right now I have a folder of png files and a csv that maps png paths to captions, with column names of “image” and “text”. But it seems like it needs a different format.

Any help appreciated!

jbmaxwell · January 13, 2023, 8:58pm

gulp… yeah, rtfm: Create an image dataset

Apologies. Maybe my post will prevent further ones like it.

jbmaxwell · January 13, 2023, 11:30pm

Actually, wait… I’ve created a file with the format given under “Image Captioning” on the doc page, but I’m hitting an error when running train_text_to_image.py:

Exception has occurred: ArrowInvalid
JSON parse error: Missing a name for object member. in row 0

Obviously something is wrong, but I’m not sure what…
I have a data folder with all my png files and my metadata.jsonl file, formatted as:

{"file_name": "something_1.png", "text": ["caption_1", "caption_2", ..., "caption_n"]}
{"file_name": "something_2.png", "text": ["caption_1", "caption_2", ..., "caption_n"]}
...
{"file_name": "something_n.png", "text": ["caption_1", "caption_2", ..., "caption_n"]}

What am I not understanding here… ?

jbmaxwell · January 14, 2023, 12:14am

Okay, just had to ask json to dump on my lines…
json_formatted = json.dumps(line_dict)

Topic		Replies	Views
Text Column not working with Image Folder 🤗Datasets	6	715	August 29, 2023
New Fine-tuner Question/Struggles 🧨 Diffusers	6	3550	February 1, 2023
Tryng to train on custome dataset but not working "--caption_column` value 'text' not found" Beginners	2	2011	April 23, 2024
Dataset format for ControlNet 🤗Datasets	2	420	December 17, 2024
Dataset.from_dict() killed 🤗Datasets	0	150	July 8, 2024

Data format for text-to-image

Related topics