TrOCR only outputs upper case?

Hi everyone,

I’ve been tinkering with Text Recognition and found that TrOCR works fantastically. The only weird thing is that it only outputs uppercase. Is this an issue/setting that I can configure on the generator side, or is it just how the model was trained?

In case anyone is curious, I believe it’s because of the model and data they used to train it. If you’re using trocr-large-printed (or any of the “printed” models from trocr), the training data has labels in CAPS. So the model has learned to return all the data in caps. If you switch to the handwritten models, they will return upper and lower case.