Are BERT models pretrained with Whole Word Masking?

Are BERT models in Transformers pretrained with Whole Word Masking?

It depends on the checkpoint you are using, we provide both version. For instance bert-base-uncased is the first BERT model pretrained without WWM, but bert-large-uncased-whole-word-masking is pretrained with WWM. Check all the checkpoints available here

1 Like