Shape mismatch between labels and logits

Hello everyone I really hope this is the correct category for this question. I’m using TFAutoModelForSequenceClassification to perform a multi labeling task on a dataset. This dataset has a text and 20 columns, one for each class. If an example has a 1 on a column it means that it belongs to that class, an example could be:

‘It’s cold today’ 0 0 1 1 0 1 0

I loaded the Dataframe into a HF dataset and I loaded the model and tokenizer with:

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
bert = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_labels, problem_type='multi_label_classification', id2label=id2labels, label2id=labels2id)

I then proceeded to convert the dataset into:

‘Tokenized text’ | ‘Array of 0s and 1s’

To do this I wrote this code:

def tokenize_and_encode(val, tokenizer, max_length):
  tokenized = tokenizer(val['Premise'], truncation=True, padding='max_length', max_length=max_length)
  labels = []
  for index in id2labels.keys():
    # Convert the columns into a single array of zeros and ones
  return {'input_ids': tokenized['input_ids'],
          'attention_mask': tokenized['attention_mask'],
          'labels': labels}

train_dataset = Dataset.from_pandas(train_df)

train_dataset = x: tokenize_and_encode(x, tokenizer, 200), remove_columns=train_dataset.column_names)

I then prepared the dataset and the model for the training phase:

batch_size = 16
num_epochs = 3
batches_per_epoch = len(train_dataset) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs)
optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)

tf_train_set = bert.prepare_tf_dataset(


Until now I have no problems in running the code. But when I call:

history =, epochs=5)

I receive a very big error, but I think the most important part is:

ValueError: `labels.shape` must equal `logits.shape` except for the last dimension. Received: labels.shape=(320,) and logits.shape=(16, 20)
    Call arguments received by layer "tf_bert_for_sequence_classification" (type TFBertForSequenceClassification):
      • self={'input_ids': 'tf.Tensor(shape=(16, 200), dtype=int64)', 'attention_mask': 'tf.Tensor(shape=(16, 200), dtype=int64)', 'labels': 'tf.Tensor(shape=(16, 20), dtype=int64)'}
      • input_ids=None
      • attention_mask=None
      • token_type_ids=None
      • position_ids=None
      • head_mask=None
      • inputs_embeds=None
      • output_attentions=None
      • output_hidden_states=None
      • return_dict=None
      • labels=None
      • training=True

The model states that it received labels of shape 320 when I should have provided a shape of (16, 20) and the line below the error states that I indeed provided a shape of (16, 20). It’s like my labels are being flattened? I can’t understand what’s happening.

Thank you very much to all of you.