Why is LayoutLMv2 Bad at Token Classification?

So for a pretrained model trained on invoice data, I expected this to run pretty smoothly. I gave it the following 9 labels to classify on the invoice document below:

labels = ['Contact Info','Address','Time','Date','Cost','Title','Table','Logo','Signature']

And this is the result (The picture is available on google images if you search “invoice pic”):


This is just a screenshot of the full document, and I haven’t truncated anything. I thought this was supposed to give you great results right out of the box but apparently not? It’s grabbing all the text, but it’s heavily misclassifying it.

Any tips on what to do?

P.S: Can LayoutLMv2 be used to extract the layout of GUI software screenshots perhaps? Or is there already another model that does that?

Thanks.