@nielsr I am using microsoft/trocr-large-printed
there is a slight issue, the model generates repeated predictions on my dataset.
if you see the left is ground_truth and the right is the model prediction
after generating the right text, it does not stop and goes on repeating the same.
Do you know what might be the issue? Am I missing any param in generate function
My decoding code looks like this
for batch in tqdm(test_dataloader):
# predict using generate
pixel_values = batch["pixel_values"].to(device)
outputs = model.generate(pixel_values, output_scores=True, return_dict_in_generate=True, max_length=22)
# decode
pred_str = processor.batch_decode(outputs.sequences, skip_special_tokens=True)
Thanks for reporting. This has been reported before (see this).This probably has to do with the settings of the generate() method, which uses greedy decoding by default. Note that the original implementation uses beam search.
Greedy decoding usually does a lot of repetition, so for tasks like TrOCR one should consider beam-search. And to avoid repetition, one could use the no_repeat_ngram_size argument. One should also set the right max_length and min_length.
I’ll investigate this a bit. Feel free to experiment with the settings of the generate method.