@nielsr I am using microsoft/trocr-large-printed
there is a slight issue, the model generates repeated predictions on my dataset.
if you see the left is ground_truth and the right is the model prediction
after generating the right text, it does not stop and goes on repeating the same.
Do you know what might be the issue? Am I missing any param in generate function
My decoding code looks like this
for batch in tqdm(test_dataloader):
# predict using generate
pixel_values = batch["pixel_values"].to(device)
outputs = model.generate(pixel_values, output_scores=True, return_dict_in_generate=True, max_length=22)
pred_str = processor.batch_decode(outputs.sequences, skip_special_tokens=True)
thanks once again
Thanks for reporting. This has been reported before (see this).This probably has to do with the settings of the
generate() method, which uses greedy decoding by default. Note that the original implementation uses beam search.
Greedy decoding usually does a lot of repetition, so for tasks like TrOCR one should consider beam-search. And to avoid repetition, one could use the
no_repeat_ngram_size argument. One should also set the right
I’ll investigate this a bit. Feel free to experiment with the settings of the generate method.
After investigation, it turns out the generate() method currently does not take into account
You can fix it by setting
model.config.eos_token_id = 2.
We will fix this soon.
awesome it worked. Thank you