[Solved] Issue on translating DPR to TFDPR on loading pytorch weights to TF model

Jung · October 27, 2020, 5:18am

Hi Huggingface team,

I would love to contribute translating DPR to TFDPR model .
This is my first time trying to contribute so please bear with me for my simple question.

I have followed @sshleifer 's great PR on TFBart model on 4 files : __init__.py , convert_pytorch_checkpoint_to_tf2.py , utils/dummy_tf_objects.py and (newly created) modeling_tf_dpr.py

Now the TF code can run properly with examples in Pytorch’s DPR (correct input/output tensors) . However, it seems the method .from_pretrained(..., from_pt=True) does not work properly as there are always warning messages that weights could not be loaded :

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDPRContextEncoder: ['ctx_encoder.bert_model.encoder.layer.7.intermediate.dense.bias', 'ctx_encoder.bert_model.encoder.layer.7.attention.output.dense.weight', ... (long list of every variable)]

I have a colab code for the modified 4 mentioned files here : https://colab.research.google.com/drive/1lU4fx7zkr-Y3CXa3wmHIY8yJhKdiN3DI?usp=sharing
(It would be great if Sam or Patrick @patrickvonplaten can take a quick look, as this may be an easy fix)
(to easily navigate the change, please “find on page” for e.g. TFDPRContextEncoder )

sshleifer · October 27, 2020, 2:38pm

try passing for example, name=‘ctx_encoder’, as a kwarg to more components (like you do for layers) and calling

super().__init__(config, **kwargs)

inside of them.

Like this:

That’s how I debugged some tensorflow layer name issues (like the one you have),
but I’m no expert.

Jung · October 29, 2020, 11:40pm

@sshleifer Thanks for the insights!!
I am now able to clear up most of the naming mismatches.
The last issue is that DPREncoder has a variable from BertModel class.
(which I already solved but seeking confirmation, see below)

And there is one important different implementation between
Pytorch’s BertModel vs. TFBertModel

While in TFBertModel we have one more nested class TFBertMainLayer (which initialize with name='bert' ), in Pytorch’s BertModel we have none of this MainLayer class, so the naming is inherently different.

In Pytorch name is like
ctx_encoder.bert_model.encoder.layer.3.attention.self.key.weight

and TF name has ‘extra’ bert inside due to the TFBertMainLayer class:
ctx_encoder.bert_model.bert.encoder.layer.3.attention.self.key.weight

===========
Solution
So the solution I made is to simply change from TFBertModel to TFBertMainLayer, and now all pytorch weights are correctly loading for ctx_encoder , question_encoder and reader . To progress in this TFDPR, I will open issue in Github seeking for PR suggestion .

Thank you again!

Topic		Replies	Views
How to load finetuned model in TF Beginners	2	450	September 28, 2020
Load model weights in a different model architecture 🤗Transformers	0	519	September 9, 2021
Upload a TF model to Huggingface Intermediate	6	1064	September 1, 2021
[Help appreciated] Modifying load_tf_weights_in_albert for transforming ALBERT tensorflow checkpoint to pytorch model 🤗Transformers	0	368	March 22, 2023
Loading pytorch_pretrained_bert models with transformers Beginners	2	1898	April 29, 2021

[Solved] Issue on translating DPR to TFDPR on loading pytorch weights to TF model

Related topics