BERT model trained on small corpus (English)?

Is there any transformer-based model pre-trained on very small data ( less than BERT)?