DALL-E - mini version

tuner007 · July 2, 2021, 2:53pm

As per the authors of DALL.E two datasets were used Conceptual Captions already suggested by @valhalla and YFCC100M subset.

The model was trained on publicly available text-image pairs collected from the internet. This data consists partly of Conceptual Captions and a filtered subset of YFCC100M. We used a subset of the filters described in Sharma et al. to construct this dataset; further details are described in our paper. We will not be releasing the dataset.

Also, they released CLIP trained on same YFCC100M dataset and later they added the subset details used for CLIP.

The subset contains 14,829,396 images, about 15% of the full dataset and showed that with this subset the performance remained largely same in case of CLIP.

What if same subset of YFCC100M was used to train DALL.E ?
Anyways as the dataset is publicly accessible i think you might be interested in it.

Excited to see the end result. Cheers !!

Topic		Replies	Views
Vision-Language Project Ideas Flax/JAX Projects	13	1549	June 30, 2021
On language as an information compression heuristic, and how to improve dalle-mini rapidly Beginners	2	548	August 5, 2022
Reproducing and Extending BEIT Flax/JAX Projects	4	1210	July 24, 2021
CLIP like contrastive vision-language models for German with pre-traind text and vision models Flax/JAX Projects	5	1828	July 4, 2021
PreTrain GPT-2 from scratch for German on novel GC4 dataset Flax/JAX Projects	7	1201	July 2, 2021

DALL-E - mini version

Related topics