LayoutLMv3 Inference

Hi, I have seen the tutorial from @nielsr Transformers-Tutorials/LayoutLMv3 at master · NielsRogge/Transformers-Tutorials · GitHub

However, I wanted to know how to get the words of each box, because in his example he is just using it to draw boxes

Did you have any luck in finding how to get the words of each box?

Could you clarify? The OCR engine gives you the boxes along with their words.