Fine-Tuning BERT Question Answering sequence output problem

While following instructions on Fine-tuning with custom datasets — transformers 4.7.0 documentation using TensorFlow Keras, model fit produces below problem and fail to start training

from transformers import TFAutoModelForQuestionAnswering

model = TFAutoModelForQuestionAnswering.from_pretrained("bert-base-multilingual-cased")

...

model.fit(...)

TypeError: The two structures don't have the same sequence type. Input structure has type <class 'tuple'>, while shallow structure has type <class 'transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput'>.

I suspect that it is related to formatting output for Keras requirements as below:

# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))

since the error message is about output and it says tuples are not transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput.

Help?

My platform is Windows 10 and libraries are

print(tf.__version__)
print(torch.__version__)
print(transformers.__version__)

2.4.0
1.9.0+cu111
4.8.2

The error here was because they asked to use return_dict = False but in TF during model compilation, we have to set run_eagerly=True in order to actually make sense of the return_dict parameter.
According to documentation:

return_dict (bool, optional) – Whether or not to return a ModelOutput instead of a plain tuple. This argument can be used in eager mode, in graph mode the value will always be set to True.

Actually it was given in warning that it couldn’t set return_dict to False and hence I was able to figure out this issue.
I don’t know its equivalent for PyTorch but I hope this helps.

1 Like

Hi [lseongjoo],

I am facing the same issue. Have you resolved it?
thank you!

1 Like

Hi dev-sajal,

I tried this but this is not working.

Hi everyone, I finally found a solution.

  1. Do not use this line
# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))

in graph mode, the model returns the prediction as “TFQuestionAnsweringModelOutput” and this model is inherited from a dictionary! So we need to use the “y” value as a dictionary too.

  1. Replace every “start_positions” with “start_logits” and every “end_positions” with “end_logits”.

The reason is when you calculate loss
TFQuestionAnsweringModelOutput has “start_logits” and "end_logits"

but

y_true has “start_positions” and “end_positions”.

After the loss calculation, tensorflow adds “start_logits” and “end_logits” keys to the y_true and throws an error about length of the y_pred and y_true is different. When you replace start_positions with start_logits etc. problem solved!

NOTE: When you apply step 2, your training code will not work on PyTorch training :smiley: You need to set the dictionary keys as “start_positions” and “end_positions”. :smiley:

1 Like