[On model.fit()]: TypeError: Exception encountered when calling layer

I am not quite sure why I can’t run GPT2 for sequence classification.
The same code works perfectly when I use GPT2 that I have pretrained myself.
The code is running perfectly for BERT (bert-base-uncased).
But somehow it doesn’t run when i used ‘gpt2’ base model.

Kaggle Notebook. In this notebook I have done both BERT and GPT2. but GPT2 is not working.

Error:

"---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File :4
File /opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
—> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File /tmp/autograph_generated_fileaokm8mv0.py:15, in outer_factory..inner_factory..tf__train_function(iterator)
13 try:
14 do_return = True
—> 15 retval
= ag
.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
File /opt/conda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py:1638, in TFPreTrainedModel.train_step(self, data)
1636 y_pred = self(x, training=True, return_loss=True)
1637 else:
→ 1638 y_pred = self(x, training=True)
1639 if self.using_dummy_loss:
1640 loss = self.compiled_loss(y_pred.loss, y_pred.loss, sample_weight, regularization_losses=self.losses)
File /tmp/autograph_generated_fileapf58knl.py:37, in outer_factory..inner_factory..tf__run_call_with_unpacked_inputs(self, *args, **kwargs)
35 try:
36 do_return = True
—> 37 retval
= ag
.converted_call(ag__.ld(func), (ag__.ld(self),), dict(**ag__.ld(unpacked_inputs)), fscope)
38 except:
39 do_return = False
File /tmp/autograph_generated_file889imx4d.py:56, in outer_factory..inner_factory..tf__call(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, labels, training)
54 ag
.if_stmt(ag__.ld(input_ids) is not None, if_body, else_body, get_state, set_state, (‘in_logits’, ‘sequence_lengths’), 2)
55 sequence_lengths = ag__.Undefined(‘sequence_lengths’)
—> 56 ag__.if_stmt(ag__.ld(self).config.pad_token_id is None, if_body_1, else_body_1, get_state_1, set_state_1, (‘in_logits’, ‘sequence_lengths’), 2)
57 loss = None
59 def get_state_3():
File /tmp/autograph_generated_file889imx4d.py:54, in outer_factory..inner_factory..tf__call..else_body_1()
52 ag
.converted_call(ag__.ld(logger).warning, (f’{ag__.ld(self).class.name} will not detect padding tokens in inputs_embeds. Results may be unexpected if using padding tokens in conjunction with inputs_embeds.',), None, fscope)
53 sequence_lengths = ag__.Undefined(‘sequence_lengths’)
—> 54 ag__.if_stmt(ag__.ld(input_ids) is not None, if_body, else_body, get_state, set_state, (‘in_logits’, ‘sequence_lengths’), 2)
File /tmp/autograph_generated_file889imx4d.py:46, in outer_factory..inner_factory..tf__call..else_body_1..if_body()
44 nonlocal in_logits, sequence_lengths
45 sequence_lengths = ag
.converted_call(ag__.ld(tf).argmax, (ag__.converted_call(ag__.ld(tf).cast, (ag__.converted_call(ag__.ld(tf).math.equal, (ag__.ld(input_ids), ag__.ld(self).config.pad_token_id), None, fscope), ag__.ld(input_ids).dtype), None, fscope),), dict(axis=-1), fscope) - 1
—> 46 sequence_lengths = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(sequence_lengths) >= 0, ag__.ld(sequence_lengths), ag__.ld(input_ids).shape[-1] - 1), None, fscope)
47 in_logits = ag__.converted_call(ag__.ld(tf).gather, (ag__.ld(logits), ag__.ld(sequence_lengths)), dict(batch_dims=1, axis=1), fscope)

TypeError: in user code:
File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1284, in train_function  *
    return step_function(self, iterator)
File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1268, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in run_step  **
    outputs = model.train_step(data)
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1638, in train_step
    y_pred = self(x, training=True)
File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_fileapf58knl.py", line 37, in tf__run_call_with_unpacked_inputs
    retval_ = ag__.converted_call(ag__.ld(func), (ag__.ld(self),), dict(**ag__.ld(unpacked_inputs)), fscope)
File "/tmp/__autograph_generated_file889imx4d.py", line 56, in tf__call
    ag__.if_stmt(ag__.ld(self).config.pad_token_id is None, if_body_1, else_body_1, get_state_1, set_state_1, ('in_logits', 'sequence_lengths'), 2)
File "/tmp/__autograph_generated_file889imx4d.py", line 54, in else_body_1
    ag__.if_stmt(ag__.ld(input_ids) is not None, if_body, else_body, get_state, set_state, ('in_logits', 'sequence_lengths'), 2)
File "/tmp/__autograph_generated_file889imx4d.py", line 46, in if_body
    sequence_lengths = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(sequence_lengths) >= 0, ag__.ld(sequence_lengths), ag__.ld(input_ids).shape[-1] - 1), None, fscope)

TypeError: Exception encountered when calling layer 'tfgpt2_for_sequence_classification_2' (type TFGPT2ForSequenceClassification).    
in user code:

    File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1050, in run_call_with_unpacked_inputs  *
        return func(self, **unpacked_inputs)
    File "/opt/conda/lib/python3.10/site-packages/transformers/models/gpt2/modeling_tf_gpt2.py", line 1088, in call  *
        sequence_lengths = tf.where(sequence_lengths >= 0, sequence_lengths, input_ids.shape[-1] - 1)

    TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'


Call arguments received by layer 'tfgpt2_for_sequence_classification_2' (type TFGPT2ForSequenceClassification):
  • input_ids={'input_ids': 'tf.Tensor(shape=(16, None), dtype=int64)', 'attention_mask': 'tf.Tensor(shape=(16, None), dtype=int64)', 'labels': 'tf.Tensor(shape=(16,), dtype=int64)'}
  • past_key_values=None
  • attention_mask=None
  • token_type_ids=None
  • position_ids=None
  • head_mask=None
  • inputs_embeds=None
  • use_cache=None
  • output_attentions=None
  • output_hidden_states=None
  • return_dict=None
  • labels=None
  • training=True

I believe this to be a simple fix. Sorry for the trouble. But help please.

@sgugger
@amyeroberts