Hi @Mit1208
If I understand correctly, the attached contains the bounding boxes obtained via the processing(s) of Kosmos2Processor
processed_text, entities = processor.post_process_generation(test_decode)
And you want to demonstrate via this image that the bounding boxes are not matching the original input of bounding boxes (where you specified via set_box
then go through normalized_box(convert_box
). Is this correct?
I can see your image width/height is set to 1025
(I didn’t verify however). Kosmos2Processor
splits the image into a 32 x 32 grid. With the original input size 224
, each cell is of siez 7x7
. However, with your image size 1025
, each cell will have size 32x32
which is quite large in document AI I believe.
In order to verify this, you can compare the original bboxes input agains the final processed/computed output bboxes, and see if their differences are in the ragen of 32 (or even 64). If the differences are all inside this range, I wouldn’t say there is something wrong in the code of Kosmos2Processor
. In this case, it’s just the limitation of such processing.
If the differences are large than 32 or 64, something is likely to be wrong and I can take a more close look.
If you don’t mind to train a model from scratch, there are some arguments could be changed to modify the default properties of Kosmos2Processor
. I can share more information if you are donw to this.