TrOCR, CER metric error

Kforcode · December 9, 2021, 7:02am

I am finetuning TrOCR and using Character Error Rate from jiwer as the metric.

def compute_cer(pred_ids, label_ids, processor):
    pred_str = processor.batch_decode(pred_ids, skip_special_tokens=True)
    label_ids[label_ids == -100] = processor.tokenizer.pad_token_id
    print(f"len of label_ids {len(label_ids)}")
    label_str = processor.batch_decode(label_ids, skip_special_tokens=True)
    print(f"len_pred_str={len(pred_str)}, len_label={len(label_str)}")
    cer = cer_metric.compute(predictions=pred_str, references=label_str)
    return cer

Except for the print statements the code is a direct copy from @nielsr tutorial . Despite len(pred_str) and len(label_str) being the same,

I am getting

ValueError: number of ground truth inputs (17) and hypothesis inputs (24) must match.

I have attached the screenshot of the same

Please let me know, if you have any clue what might be causing the issue

nielsr · December 9, 2021, 10:05am

I believe this was a bug that has been fixed, see Datasets.load_metric("cer") does not work

Topic		Replies	Views
Datasets.load_metric("cer") does not work 🤗Datasets	2	2276	November 17, 2021
Tr ocr training error 🤗Transformers	0	248	January 26, 2024
This is my fine tuning trocr code why is it not working anyone please help me I really need your help I am working on new language 🤗Transformers	9	29	July 8, 2025
TrOCR issues Stop Iteration training Models	0	391	March 24, 2023
Fine-tuning TrOCR on custom dataset 🤗Transformers	1	2584	October 18, 2023

TrOCR, CER metric error

Related topics