AutoTrain error with Sequential data on evaluation loop

Hi all :wave:,

I’m currently training a SeqToSeq model using the Trainer class. When running the training loop everything works fine but when reaching the evaluation loop I get the following error :

Traceback (most recent call last):
  File "/home/vdm/SLAM-ASR/src/", line 109, in <module>
  File "/home/vdm/SLAM-ASR/src/", line 105, in main
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 1624, in train
    return inner_training_loop(
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 2029, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 2412, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 3229, in evaluate
    output = eval_loop(
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 3452, in evaluation_loop
    preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 123, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 123, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 123, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 123, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 123, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 123, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 125, in nested_concat
    return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
  File "/home/vdm/.pyenv/versions/3.10.13/envs/SLAM-ASR/lib/python3.10/site-packages/transformers/", line 84, in torch_pad_and_concatenate
    return, tensor2), dim=0)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 229 but got size 211 for tensor number 1 in the list.

In order to debug I have printed the shape of my logits, my labels and my shifted logits (used to compute the loss). Here are the logs:

outputs.logits.shape=torch.Size([8, 229, 51200]) shift_logits.shape=torch.Size([488, 51200]) shift_labels.shape=torch.Size([488])
outputs.logits.shape=torch.Size([8, 211, 51200]) shift_logits.shape=torch.Size([416, 51200]) shift_labels.shape=torch.Size([416])

It’s like the evaluation loop expected the output sequence to be the same length across each samples, but it that was the case why would the training loop works but not the eval one?

I’m really confused about it, so if any one has an idea I would love to hear it :smiley:

Source Code

data processing

def add_raw_speech_feature_to_dataset(batch, processor):
    value = processor(

    batch["input_values"] = value

    batch["input_length"] = len(batch["input_values"])

    batch["labels"] = processor(
        text=batch["text"].capitalize() + ".",

    return batch


class DataCollator:

    processor: Wav2Vec2Processor
    padding: Union[bool, str] = True

    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:

        # split inputs and labels since they have to be of different lenghts and need different padding methods
        input_features = [{"input_values": feature["input_values"]} for feature in features]
        label_features = [{"input_ids": feature["labels"]} for feature in features]

        batch = self.processor.pad(

        labels_batch = self.processor.pad(

        labels = labels_batch["input_ids"].masked_fill(, -100)

        batch["labels"] = labels

        return batch

doesnt look like you are using autotrain. please post this in transformers discussions

My bad you’re right, I confused Trainer for AutoTrain. Category has been updated :slight_smile:

This is partially fixed, it appears that at evaluation time Trainer tries to concat the past_key_values attribute from the CausalLMOutputWithPast returned by the model.

However as the model is a SeqToSeq each sequence can have a different length, of course each batch is padded to contains the same number of elements but not across every batches, meaning that one batch can contains a different number of elements as another thus resulting in a cat error.

Providing this key as None “fixes” this issue, however I don’t understand why:

  • This behavior only occurs at evaluation time and not during the training loop
  • Evaluation loop would need the past_key_values attribute

If someone has some insights I would be curious to know :slight_smile: