LayoutLMv3 Inference

Hi, I have seen the tutorial from @nielsr Transformers-Tutorials/LayoutLMv3 at master · NielsRogge/Transformers-Tutorials · GitHub

However, I wanted to know how to get the words of each box, because in his example he is just using it to draw boxes