from transformers import TFAutoModelForQuestionAnswering
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-base-multilingual-cased")
...
model.fit(...)
TypeError: The two structures don't have the same sequence type. Input structure has type <class 'tuple'>, while shallow structure has type <class 'transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput'>.
I suspect that it is related to formatting output for Keras requirements as below:
# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))
since the error message is about output and it says tuples are not transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput.
The error here was because they asked to use return_dict = False but in TF during model compilation, we have to set run_eagerly=True in order to actually make sense of the return_dict parameter.
According to documentation:
return_dict (bool, optional) – Whether or not to return a ModelOutput instead of a plain tuple. This argument can be used in eager mode, in graph mode the value will always be set to True.
Actually it was given in warning that it couldn’t set return_dict to False and hence I was able to figure out this issue.
I don’t know its equivalent for PyTorch but I hope this helps.
# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))
in graph mode, the model returns the prediction as “TFQuestionAnsweringModelOutput” and this model is inherited from a dictionary! So we need to use the “y” value as a dictionary too.
Replace every “start_positions” with “start_logits” and every “end_positions” with “end_logits”.
The reason is when you calculate loss TFQuestionAnsweringModelOutput has “start_logits” and "end_logits"
but
y_true has “start_positions” and “end_positions”.
After the loss calculation, tensorflow adds “start_logits” and “end_logits” keys to the y_true and throws an error about length of the y_pred and y_true is different. When you replace start_positions with start_logits etc. problem solved!
NOTE: When you apply step 2, your training code will not work on PyTorch training You need to set the dictionary keys as “start_positions” and “end_positions”.