DALL-E - mini version

As per the authors of DALL.E two datasets were used Conceptual Captions already suggested by @valhalla and YFCC100M subset.

The model was trained on publicly available text-image pairs collected from the internet. This data consists partly of Conceptual Captions and a filtered subset of YFCC100M. We used a subset of the filters described in Sharma et al. to construct this dataset; further details are described in our paper. We will not be releasing the dataset.

Also, they released CLIP trained on same YFCC100M dataset and later they added the subset details used for CLIP.

The subset contains 14,829,396 images, about 15% of the full dataset and showed that with this subset the performance remained largely same in case of CLIP.

What if same subset of YFCC100M was used to train DALL.E ? :wink:
Anyways as the dataset is publicly accessible i think you might be interested in it. :hugs:

Excited to see the end result. Cheers !!

4 Likes