I would love to contribute translating DPR to TFDPR model .
This is my first time trying to contribute so please bear with me for my simple question.
I have followed @sshleifer 's great PR on TFBart model on 4 files : __init__.py , convert_pytorch_checkpoint_to_tf2.py , utils/dummy_tf_objects.py and (newly created) modeling_tf_dpr.py
Now the TF code can run properly with examples in Pytorch’s DPR (correct input/output tensors) . However, it seems the method .from_pretrained(..., from_pt=True) does not work properly as there are always warning messages that weights could not be loaded :
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDPRContextEncoder: ['ctx_encoder.bert_model.encoder.layer.7.intermediate.dense.bias', 'ctx_encoder.bert_model.encoder.layer.7.attention.output.dense.weight', ... (long list of every variable)]
@sshleifer Thanks for the insights!!
I am now able to clear up most of the naming mismatches.
The last issue is that DPREncoder has a variable from BertModel class.
(which I already solved but seeking confirmation, see below)
And there is one important different implementation between
Pytorch’s BertModel vs. TFBertModel
While in TFBertModel we have one more nested class TFBertMainLayer (which initialize with name='bert' ), in Pytorch’s BertModel we have none of this MainLayer class, so the naming is inherently different.
In Pytorch name is like ctx_encoder.bert_model.encoder.layer.3.attention.self.key.weight
and TF name has ‘extra’bert inside due to the TFBertMainLayer class: ctx_encoder.bert_model.bert.encoder.layer.3.attention.self.key.weight
=========== Solution
So the solution I made is to simply change from TFBertModel to TFBertMainLayer, and now all pytorch weights are correctly loading for ctx_encoder , question_encoder and reader . To progress in this TFDPR, I will open issue in Github seeking for PR suggestion .