Pre-trained models that weren't trained on Wikipedia?

From the BERT paper:

For the pre-training corpus we use the BooksCorpus (800M words) (Zhu et al., 2015) and English Wikipedia (2,500M words)

Are there any pre-trained transformers like BERT that weren’t trained on Wikipedia?

from ELECTRA paper:

For most experiments we pre-train on the same data as BERT, which consists
of 3.3 Billion tokens from Wikipedia and BooksCorpus (Zhu et al., 2015). However, for our Large model we pre-trained on the data used for XLNet (Yang et al., 2019), which extends the BERT dataset to 33B tokens …

so that excludes ELECTRA and XLNet too

for posterity: someone recommended “tweet BERT” or something like that, which was only trained on tweets