Replace weights in TFBertModel

Hi everyone,

I have a multilabel model built from the TFAutoModelForSequenceClassification in which I took the TFBertMainLayer (in the code below it is the bert = transformer_model.layers[0]) on top of which I added a Dropout and a Dense layer.
After compiling and fitting model I saved the model weights as an h5 file and saved the model architecture in a json file (using model.to_json() in Keras).

        bert = transformer_model.layers[0]
        input_ids = tf.keras.layers.Input(shape=(input_dim,), name='input_ids', dtype='int32')
        attention_mask = tf.keras.layers.Input(shape=(input_dim,), name='attention_mask', dtype='int32')
        inputs = {'input_ids': input_ids, 'attention_mask': attention_mask}
        # https://github.com/huggingface/transformers/issues/7540
        bert_model = bert(input_ids, attention_mask)[1]
        X = tf.keras.layers.Dropout(transformer_model.config.hidden_dropout_prob, name='pooled_output', trainable=True)(bert_model)
        X = tf.keras.layers.Dense(units=num_labels, activation='sigmoid', name='dense', trainable=True)(X)
        model = tf.keras.Model(inputs=inputs, outputs=X)

I want to visualize the attention weights of the model and came across https://github.com/jessevig/bertviz. However, it doesn’t look like it works well with models not in based on pytorch objects.
A possible solution I thought about includes the following steps:

  1. Use theTFBertModel: initialize the TFBertModel and replace the weights of the TFBertMainLayer with the weights of my trained model. Namely, I tried doing something like this

tf_bert_model = TFBertModel.from_pretrained('bert-base-uncased')
bert.layers[0]=model.layers[2]

But it doesn’t seem to work and I am not able to replace the weights.

  1. Then if I can get step #1 to work I thought to save the tf_bert_model using tf_bert_model.save_pretrained() and load it to the pytorch class BertModel which should then enable me to work with bertviz.

Any ideas how I can replace the weights to make step #1 work? Or another idea to get around the issue so I can get bertviz working with my keras model?

Any help will be greatly appreciated.
Ayala Allon

Hi (@ayalaall ) Ayala,

I had a similar problem. (I had a model in TF and I wanted to use BertViz). Like you, I thought it should be possible to copy the TF weights into a Pytorch framework. After all, they are just a bunch of numbers. However, I couldn’t get it to work. I could be wrong, but I don’t think there is an easy way to move a model from TF to Pytorch.

My solution (which worked eventually) was to start again and train a new model using Pytorch. This made sense for me, as I wasn’t particularly expert in TF/keras, and I though it might be handy to learn Pytorch.

Afterwards, I thought it would have been Better to write a copy of BertViz that was designed to work with TF. (How hard can it be…?). If you are expert in TF and in Python then this might be a good solution for you.

A third possibility would be to look at the internal structures of the way the weights are stored for TF and for Pytorch, and to force your model’s numbers into a Pytorch-like structure. Since both the TF and the Pytorch models are implementations of the same Attention-based Encoder, I think this should be theoretically possible. It doesn’t sound easy though.

It is possible that somebody has written a TF-Pytorch Converter program. I couldn’t find one when I looked, but that was nearly two years ago, so there might be one now. If you ask another question on this forum with “TF toPytorch Model Conversion” in the title, somebody might know (but don’t hold your breath waiting).

It is possible that somebody has written a Visualisation tool for TF Bert models. Again, I couldn’t find one two years ago. How much have you searched?

1 Like