TrOCR repeated generation

@nielsr I am using microsoft/trocr-large-printed
there is a slight issue, the model generates repeated predictions on my dataset.

if you see the left is ground_truth and the right is the model prediction
repeat

after generating the right text, it does not stop and goes on repeating the same.
Do you know what might be the issue? Am I missing any param in generate function
My decoding code looks like this

for batch in tqdm(test_dataloader):
    # predict using generate
    pixel_values = batch["pixel_values"].to(device)
    outputs = model.generate(pixel_values, output_scores=True, return_dict_in_generate=True, max_length=22)
    
    # decode
    pred_str = processor.batch_decode(outputs.sequences, skip_special_tokens=True)

thanks once again

Hi,

Thanks for reporting. This has been reported before (see this).This probably has to do with the settings of the generate() method, which uses greedy decoding by default. Note that the original implementation uses beam search.

Greedy decoding usually does a lot of repetition, so for tasks like TrOCR one should consider beam-search. And to avoid repetition, one could use the no_repeat_ngram_size argument. One should also set the right max_length and min_length.

I’ll investigate this a bit. Feel free to experiment with the settings of the generate method.

1 Like

Hi,

After investigation, it turns out the generate() method currently does not take into account config.decoder.eos_token_id, only config.eos_token_id.

You can fix it by setting model.config.eos_token_id = 2.

We will fix this soon.

2 Likes

awesome it worked. Thank you