[Solved] Issue on translating DPR to TFDPR on loading pytorch weights to TF model

Hi Huggingface team,

I would love to contribute translating DPR to TFDPR model .
This is my first time trying to contribute so please bear with me for my simple question.

I have followed @sshleifer 's great PR on TFBart model on 4 files : __init__.py , convert_pytorch_checkpoint_to_tf2.py , utils/dummy_tf_objects.py and (newly created) modeling_tf_dpr.py

Now the TF code can run properly with examples in Pytorch’s DPR (correct input/output tensors) . However, it seems the method .from_pretrained(..., from_pt=True) does not work properly as there are always warning messages that weights could not be loaded :

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDPRContextEncoder: ['ctx_encoder.bert_model.encoder.layer.7.intermediate.dense.bias', 'ctx_encoder.bert_model.encoder.layer.7.attention.output.dense.weight', ... (long list of every variable)]

I have a colab code for the modified 4 mentioned files here : https://colab.research.google.com/drive/1lU4fx7zkr-Y3CXa3wmHIY8yJhKdiN3DI?usp=sharing
(It would be great if Sam or Patrick @patrickvonplaten can take a quick look, as this may be an easy fix)
(to easily navigate the change, please “find on page” for e.g. TFDPRContextEncoder )

try passing for example, name=‘ctx_encoder’, as a kwarg to more components (like you do for layers) and calling

super().__init__(config, **kwargs)

inside of them.

Like this:
image

That’s how I debugged some tensorflow layer name issues (like the one you have),
but I’m no expert.

1 Like

@sshleifer Thanks for the insights!!
I am now able to clear up most of the naming mismatches.
The last issue is that DPREncoder has a variable from BertModel class.
(which I already solved but seeking confirmation, see below)

And there is one important different implementation between
Pytorch’s BertModel vs. TFBertModel

While in TFBertModel we have one more nested class TFBertMainLayer (which initialize with name='bert' ), in Pytorch’s BertModel we have none of this MainLayer class, so the naming is inherently different.

In Pytorch name is like
ctx_encoder.bert_model.encoder.layer.3.attention.self.key.weight

and TF name has ‘extra’ bert inside due to the TFBertMainLayer class:
ctx_encoder.bert_model.bert.encoder.layer.3.attention.self.key.weight

===========
Solution
So the solution I made is to simply change from TFBertModel to TFBertMainLayer, and now all pytorch weights are correctly loading for ctx_encoder , question_encoder and reader . To progress in this TFDPR, I will open issue in Github seeking for PR suggestion .

Thank you again!