I have tried to finetune SDXL model on a subset of LAION-aesthetic-5+ (89M) using the example code.
I used this code for loading data:
dataset = load_dataset(“imagefolder”, data_dir=args.train_data_dir, split=“train”)
args.train_data_dir denote the data directory including over 89M image-text pairs.
But the data loading time is too long and eventually it failed to load the data.
For this large-scale Text-to-Image mode training, is there a more efficient way?
Thanks in advance