Pre-training datasets for base and roberta

I find that the pre-training datasets for bert-base-uncased, roberta-base, roberta large… is same. wikipedia 19.3k and bookcorpus 8.4k. This is not consistent with the datasets described in paper. Is this an error? If it is, you may revise it.