Data format for text-to-image

Hi All,

I’m trying to load a custom dataset for text-to-image fine-tuning but I’m not sure how the data needs to be formatted. Right now I have a folder of png files and a csv that maps png paths to captions, with column names of “image” and “text”. But it seems like it needs a different format.

Any help appreciated!

gulp… yeah, rtfm: Create an image dataset

Apologies. Maybe my post will prevent further ones like it. :face_with_open_eyes_and_hand_over_mouth:

Actually, wait… I’ve created a file with the format given under “Image Captioning” on the doc page, but I’m hitting an error when running train_text_to_image.py:

Exception has occurred: ArrowInvalid
JSON parse error: Missing a name for object member. in row 0

Obviously something is wrong, but I’m not sure what…
I have a data folder with all my png files and my metadata.jsonl file, formatted as:

{"file_name": "something_1.png", "text": ["caption_1", "caption_2", ..., "caption_n"]}
{"file_name": "something_2.png", "text": ["caption_1", "caption_2", ..., "caption_n"]}
...
{"file_name": "something_n.png", "text": ["caption_1", "caption_2", ..., "caption_n"]}

What am I not understanding here… ?

Okay, just had to ask json to dump on my lines… :slight_smile:
json_formatted = json.dumps(line_dict)