Inconsistent _tied_weights_keys path in BERT models

I noticed that there are inconsistent paths in _tied_weights_keys between different BERT model classes.

In BertForMaskedLM [link]:

_tied_weights_keys = ["predictions.decoder.bias", "cls.predictions.decoder.weight"]

In BertForPreTraining [link]:

_tied_weights_keys = ["cls.predictions.decoder.bias", "cls.predictions.decoder.weight"]

Questions:

  1. Why are there different paths for accessing the same weights?
  2. Is one of these paths incorrect? If so, which one should be used?
  3. If both are valid, what’s the rationale behind using different paths?
1 Like