How is the eval dataset processed in a trainer?

Hi, I’m currently trying to cumpute metrics for a SpeechT5 model ASR task but I get the following error when i try to load my compute_metrics function: Sizes of tensors must match except in dimension 0. Expected size 216 but got size 231 for tensor number 1 in the list. My compute_metrics function and my Seq2Seqtrainer look like this:

trainer = Seq2SeqTrainer(

def compute_metrics(predictions):


    predicted_ids = predictions.predictions
    reference_ids = predictions.labels_ids


    # label_ids[label_ids == -100] = processor.tokenizer.pad_token_id

    pred_str = processor.batch_decode(pred_ids, skip_special_tokens=True)
    label_str = processor.batch_decode(label_ids, skip_special_tokens=True)

    # wer_ortho = 100 * metric.compute(predictions=pred_str, references=label_str)

    wer = 100 * metric.compute(predictions=pred_str_norm, references=label_str_norm)

    wer_metric = wer(reference_texts, predicted_texts)

    # f1_metric = f1_score(reference_texts, predicted_texts)

    return {"wer_ortho": wer_ortho, "wer": wer}

I have read about similar errors people had on the forum but I’m not sure if I understood most of them right. I have a feeling that the eval_dataset is not processed through the data collator and therefore i dont pad my eval dataset. Some said its got something to do with the padding length, since I use the longest strategy right now some people said to use max_length. However I am using a processor to pad and my audio files are way larger than my text file so padding to max length causes Cuda out of memory errors. Is there another way to get rid of this error effectively?? Any help is appreciated!