Pre-trained models that weren't trained on Wikipedia?

jxm · February 8, 2022, 8:18pm

From the BERT paper:

For the pre-training corpus we use the BooksCorpus (800M words) (Zhu et al., 2015) and English Wikipedia (2,500M words)

Are there any pre-trained transformers like BERT that weren’t trained on Wikipedia?

jxm · February 8, 2022, 8:20pm

from ELECTRA paper:

For most experiments we pre-train on the same data as BERT, which consists
of 3.3 Billion tokens from Wikipedia and BooksCorpus (Zhu et al., 2015). However, for our Large model we pre-trained on the data used for XLNet (Yang et al., 2019), which extends the BERT dataset to 33B tokens …

so that excludes ELECTRA and XLNet too

jxm · February 10, 2022, 2:53am

for posterity: someone recommended “tweet BERT” or something like that, which was only trained on tweets

Topic		Replies	Views
Training BERT from scratch with Wikipedia + Book Corpus Dataset 🤗Transformers	1	4647	January 22, 2021
Pre-Train BERT (from scratch) Research	43	19005	June 27, 2022
Pre-train BERT with HF Trainer 🤗Transformers	0	739	April 22, 2022
Cost to fine tune large transformer models on the cloud? Beginners	1	1524	November 29, 2021
Pre-training BERT Models	1	382	May 21, 2024

Pre-trained models that weren't trained on Wikipedia?

Related topics