Are BERT models in Transformers pretrained with Whole Word Masking?
It depends on the checkpoint you are using, we provide both version. For instance bert-base-uncased
is the first BERT model pretrained without WWM, but bert-large-uncased-whole-word-masking
is pretrained with WWM. Check all the checkpoints available here
1 Like