I noticed that there are inconsistent paths in _tied_weights_keys
between different BERT model classes.
In BertForMaskedLM
[link]:
_tied_weights_keys = ["predictions.decoder.bias", "cls.predictions.decoder.weight"]
In BertForPreTraining
[link]:
_tied_weights_keys = ["cls.predictions.decoder.bias", "cls.predictions.decoder.weight"]
Questions:
- Why are there different paths for accessing the same weights?
- Is one of these paths incorrect? If so, which one should be used?
- If both are valid, what’s the rationale behind using different paths?