LayoutLMv3 inference - bboxes are incorrect

Hello all,

I’m working through the True_inference_with_LayoutLMv2ForTokenClassification_+_Gradio_demo.ipynb notebook, but I’m having a problem with the predicted bounding boxes not looking like the example. I thought it might have something to do with how the coordinates were encoded, but I tried tinkering with them and couldn’t get it sorted.

I had to do a few workarounds to get the code to run on my company’s AWS account (have to run in a SageMaker notebook, can’t use PyTesseract inside the notebook, and can’t download packages directly from GitHub) so I think that might’ve caused the issue, but I can’t figure out exactly what.

Does anyone know what I’m doing wrong?

Expected on left; actual on right.

Details:

Expected bboxes output
[[99, 140, 123, 153],
 [99, 168, 144, 181],
 [839, 198, 850, 208],
 [99, 223, 168, 234],
 [98, 348, 164, 361],
 [120, 374, 189, 388],
 [122, 404, 188, 415],
 [124, 433, 189, 444],
 [99, 293, 156, 306],
 [513, 374, 575, 388],
 [513, 404, 575, 417],
 [509, 432, 578, 445],
 [444, 587, 470, 598],
 [448, 601, 466, 614],
 [452, 617, 466, 628],
 [448, 649, 466, 662],
 [449, 666, 466, 676],
 [452, 680, 469, 693],
 [448, 697, 468, 708],
 [895, 774, 923, 879],
 [112, 77, 136, 87],
 [139, 77, 163, 87],
 [163, 77, 180, 88],
 [205, 77, 229, 88],
 [232, 77, 250, 88],
 [299, 77, 323, 88],
 [332, 78, 356, 88],
 [367, 80, 400, 88],
 [490, 78, 564, 89],
 [574, 78, 610, 89],
 [854, 78, 868, 93],
 [868, 81, 896, 92],
 [298, 142, 311, 152],
 [316, 142, 331, 150],
 [335, 141, 387, 154],
 [298, 169, 312, 180],
 [314, 170, 331, 178],
 [335, 170, 380, 181],
 [298, 225, 326, 233],
 [330, 223, 366, 234],
 [370, 223, 432, 234],
 [438, 225, 485, 235],
 [489, 225, 498, 236],
 [501, 223, 543, 234],
 [547, 225, 595, 235],
 [602, 225, 640, 235],
 [653, 223, 720, 234],
 [725, 226, 779, 233],
 [755, 170, 785, 181],
 [789, 170, 799, 181],
 [750, 198, 774, 206],
 [781, 198, 799, 208],
 [598, 169, 631, 182],
 [635, 170, 649, 180],
 [602, 197, 629, 208],
 [633, 197, 651, 208],
 [99, 320, 140, 331],
 [143, 320, 160, 331],
 [161, 320, 217, 333],
 [220, 320, 271, 333],
 [275, 320, 346, 333],
 [348, 320, 384, 331],
 [388, 320, 444, 333],
 [444, 321, 462, 332],
 [466, 320, 513, 333],
 [204, 376, 255, 387],
 [298, 377, 309, 387],
 [311, 377, 350, 388],
 [364, 376, 375, 387],
 [204, 404, 240, 417],
 [295, 404, 309, 417],
 [311, 402, 348, 416],
 [355, 404, 368, 415],
 [370, 404, 379, 415],
 [201, 432, 251, 443],
 [295, 432, 309, 443],
 [311, 432, 348, 445],
 [355, 430, 366, 444],
 [587, 377, 632, 388],
 [635, 377, 672, 388],
 [710, 374, 724, 387],
 [725, 377, 762, 388],
 [774, 376, 785, 387],
 [584, 402, 631, 416],
 [635, 404, 671, 417],
 [710, 405, 721, 415],
 [725, 405, 762, 415],
 [774, 404, 785, 417],
 [587, 433, 629, 444],
 [710, 432, 721, 443],
 [725, 432, 762, 443],
 [775, 433, 785, 443],
 [139, 500, 194, 511],
 [201, 501, 290, 512],
 [295, 500, 332, 511],
 [336, 500, 396, 511],
 [400, 500, 537, 511],
 [542, 499, 599, 510],
 [603, 499, 635, 514],
 [639, 499, 700, 514],
 [102, 514, 128, 527],
 [133, 514, 147, 527],
 [152, 514, 222, 528],
 [225, 514, 309, 527],
 [314, 514, 336, 525],
 [342, 513, 377, 527],
 [381, 515, 423, 525],
 [428, 514, 503, 527],
 [509, 513, 566, 526],
 [570, 514, 594, 525],
 [596, 515, 652, 525],
 [655, 514, 712, 527],
 [717, 513, 763, 526],
 [129, 570, 171, 583],
 [176, 570, 194, 583],
 [197, 571, 263, 584],
 [322, 573, 379, 581],
 [440, 559, 462, 567],
 [470, 557, 488, 567],
 [435, 571, 490, 582],
 [545, 570, 586, 581],
 [588, 570, 606, 580],
 [608, 571, 675, 582],
 [724, 571, 785, 581],
 [824, 556, 848, 569],
 [858, 557, 880, 568],
 [827, 570, 877, 583],
 [102, 588, 148, 598],
 [159, 587, 204, 598],
 [102, 602, 147, 615],
 [159, 603, 214, 613],
 [102, 617, 129, 630],
 [131, 619, 148, 629],
 [148, 619, 194, 630],
 [102, 648, 128, 662],
 [127, 649, 163, 660],
 [99, 663, 122, 678],
 [125, 665, 152, 678],
 [99, 683, 159, 694],
 [103, 698, 136, 708],
 [140, 698, 167, 709],
 [326, 585, 350, 598],
 [351, 587, 362, 598],
 [367, 587, 377, 597],
 [330, 603, 348, 614],
 [351, 603, 359, 611],
 [360, 603, 374, 614],
 [331, 617, 348, 628],
 [351, 617, 359, 628],
 [360, 617, 370, 628],
 [323, 648, 350, 661],
 [358, 649, 366, 660],
 [370, 648, 379, 662],
 [323, 665, 351, 679],
 [355, 666, 364, 679],
 [368, 665, 381, 679],
 [326, 682, 348, 696],
 [354, 680, 362, 693],
 [367, 682, 376, 693],
 [322, 697, 350, 710],
 [355, 697, 364, 710],
 [370, 696, 380, 710],
 [734, 908, 770, 923],
 [770, 909, 779, 922],
 [793, 911, 810, 922],
 [811, 909, 824, 923],
 [826, 909, 874, 923]]
Actual bboxes output
[[0.0, 0.0, 0.0, 0.0],
 [37.7, 9.0, 64.844, 78.0],
 [23.374, 8.0, 117.624, 79.0],
 [23.374, 9.0, 160.602, 79.0],
 [13.572, 8.0, 190.008, 80.0],
 [18.096, 8.0, 208.858, 80.0],
 [41.47, 8.0, 279.734, 81.0],
 [18.85, 8.0, 326.48199999999997, 81.0],
 [21.866, 11.0, 487.084, 81.0],
 [19.604, 28.0, 50.518, 132.0],
 [8.293999999999999, 8.0, 180.206, 143.0],
 [29.406, 10.0, 190.762, 143.0],
 [27.144, 28.0, 54.288, 160.0],
 [16.587999999999997, 8.0, 170.404, 171.0],
 [24.882, 10.0, 190.762, 172.0],
 [22.619999999999997, 10.0, 58.058, 322.0],
 [6.032, 8.0, 83.694, 322.0],
 [31.668000000000003, 9.0, 91.988, 322.0],
 [27.898, 8.0, 126.67200000000001, 323.0],
 [38.454, 9.0, 157.58599999999998, 322.0],
 [19.604, 8.0, 198.30200000000002, 323.0],
 [42.224000000000004, 9.0, 220.922, 322.0],
 [26.390000000000004, 10.0, 265.40799999999996, 322.0],
 [35.438, 9.0, 58.058, 350.0],
 [35.438, 8.0, 70.876, 378.0],
 [28.652, 9.0, 116.87, 378.0],
 [24.882, 28.0, 150.04600000000002, 368.0],
 [23.374, 9.0, 174.928, 378.0],
 [3.77, 28.0, 206.596, 368.0],
 [34.684, 9.0, 291.798, 378.0],
 [30.16, 28.0, 328.74399999999997, 368.0],
 [30.16, 28.0, 358.904, 368.0],
 [3.016, 9.0, 405.65200000000004, 378.0],
 [21.866, 9.0, 411.684, 378.0],
 [4.524, 9.0, 441.844, 378.0],
 [36.192, 8.0, 70.122, 406.0],
 [18.85, 8.0, 116.87, 406.0],
 [4.524, 10.0, 170.404, 405.0],
 [24.128, 10.0, 174.174, 405.0],
 [11.309999999999999, 8.0, 203.58, 406.0],
 [34.684, 8.0, 291.798, 406.0],
 [30.16, 28.0, 328.74399999999997, 396.0],
 [20.358, 28.0, 360.412, 396.0],
 [28.652, 9.0, 404.898, 406.0],
 [5.2780000000000005, 8.0, 441.09, 407.0],
 [35.438, 8.0, 70.876, 434.0],
 [25.636000000000003, 10.0, 116.87, 434.0],
 [27.898, 9.0, 170.404, 433.0],
 [4.524, 8.0, 203.58, 434.0],
 [34.684, 8.0, 291.798, 434.0],
 [23.374, 9.0, 334.776, 433.0],
 [4.524, 8.0, 404.898, 434.0],
 [23.374, 8.0, 410.17600000000004, 434.0],
 [5.2780000000000005, 7.0, 441.09, 434.0],
 [31.668000000000003, 10.0, 79.92399999999999, 502.0],
 [51.272000000000006, 10.0, 114.608, 502.0],
 [19.604, 10.0, 168.89600000000002, 502.0],
 [33.175999999999995, 10.0, 192.27, 502.0],
 [76.908, 10.0, 229.216, 502.0],
 [31.668000000000003, 10.0, 309.14, 502.0],
 [17.342, 10.0, 343.824, 502.0],
 [33.93, 10.0, 364.182, 502.0],
 [13.572, 10.0, 58.812, 516.0],
 [6.032, 7.0, 76.908, 518.0],
 [37.7, 10.0, 88.218, 516.0],
 [45.994, 9.0, 129.688, 516.0],
 [12.818000000000001, 9.0, 179.452, 516.0],
 [17.342, 9.0, 196.04000000000002, 516.0],
 [24.128, 9.0, 217.152, 516.0],
 [42.978, 9.0, 244.29600000000002, 516.0],
 [30.914, 9.0, 290.29, 516.0],
 [12.064, 9.0, 324.21999999999997, 516.0],
 [29.406, 9.0, 340.05400000000003, 516.0],
 [31.668000000000003, 9.0, 372.476, 516.0],
 [25.636000000000003, 9.0, 407.91400000000004, 516.0],
 [24.882, 12.0, 418.47, 911.0],
 [17.342, 10.0, 450.892, 911.0],
 [7.54, 28.0, 460.694, 902.0],
 [23.374, 12.0, 472.004, 912.0],
 [13.572, 99.0, 509.704, 779.0],
 [754.0, 1000.0, 754.0, 1000.0]]
pip list output
Package                   Version
------------------------- ------------------
absl-py                   2.1.0
accelerate                0.30.0
aiohttp                   3.9.5
aiosignal                 1.3.1
aniso8601                 9.0.1
annotated-types           0.6.0
ansi2html                 1.9.1
antlr4-python3-runtime    4.9.3
anyio                     4.3.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.1
async-lru                 2.0.4
async-timeout             4.0.3
attrs                     23.2.0
autovizwidget             0.21.0
awscli                    1.32.93
Babel                     2.14.0
beautifulsoup4            4.12.3
black                     24.4.2
bleach                    6.1.0
blinker                   1.7.0
bokeh                     3.4.0
boto3                     1.34.93
botocore                  1.34.93
Brotli                    1.1.0
cached-property           1.5.2
captum                    0.6.0
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               2.2.1
colorama                  0.4.4
comm                      0.2.2
contextlib2               21.6.0
contourpy                 1.2.0
cryptography              42.0.5
cycler                    0.12.1
datasets                  2.19.1
debugpy                   1.8.1
decorator                 5.1.1
defusedxml                0.7.1
detectron2                0.6
dill                      0.3.8
docker                    6.1.3
docutils                  0.16
dparse                    0.6.3
entrypoints               0.4
environment-kernels       1.2.0
evaluate                  0.4.2
exceptiongroup            1.2.0
executing                 2.0.1
fastjsonschema            2.19.1
filelock                  3.13.3
Flask                     3.0.2
Flask-RESTful             0.3.10
fonttools                 4.50.0
fqdn                      1.5.1
frozenlist                1.4.1
fsspec                    2024.3.1
fvcore                    0.1.5.post20221221
gmpy2                     2.1.2
google-pasta              0.2.0
grpcio                    1.63.0
gssapi                    1.8.3
gym                       0.26.2
gym-notices               0.0.8
h11                       0.14.0
h2                        4.1.0
hdijupyterutils           0.21.0
hpack                     4.0.0
httpcore                  1.0.4
httpx                     0.27.0
huggingface-hub           0.23.0
hydra-core                1.3.2
hyperframe                6.0.1
idna                      3.6
imageio                   2.34.0
importlib-metadata        6.11.0
importlib_resources       6.4.0
iopath                    0.1.9
ipykernel                 6.29.3
ipython                   8.22.2
ipywidgets                8.1.2
isoduration               20.11.0
itsdangerous              2.1.2
jedi                      0.19.1
Jinja2                    3.1.3
jmespath                  1.0.1
joblib                    1.3.2
json5                     0.9.24
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter                   1.0.0
jupyter_client            8.6.1
jupyter-console           6.6.3
jupyter_core              5.7.2
jupyter-events            0.10.0
jupyter-lsp               2.2.4
jupyter_server            2.13.0
jupyter_server_terminals  0.5.3
jupyterlab                4.1.5
jupyterlab_pygments       0.3.0
jupyterlab_server         2.25.4
jupyterlab_widgets        3.0.10
kiwisolver                1.4.5
krb5                      0.5.1
llvmlite                  0.42.0
Markdown                  3.6
MarkupSafe                2.1.5
matplotlib                3.8.3
matplotlib-inline         0.1.6
mistune                   3.0.2
mpi4py                    3.1.5
mpmath                    1.3.0
multidict                 6.0.5
multiprocess              0.70.16
munkres                   1.1.4
mypy-extensions           1.0.0
nbclient                  0.10.0
nbconvert                 7.16.3
nbformat                  5.10.3
nest_asyncio              1.6.0
networkx                  3.2.1
notebook                  7.1.2
notebook_shim             0.2.4
numba                     0.59.1
numpy                     1.26.4
nvgpu                     0.10.0
nvidia-ml-py              12.535.133
omegaconf                 2.3.0
onnx                      1.16.0
opencv-python             4.9.0.80
overrides                 7.7.0
packaging                 24.0
pandas                    1.5.3
pandocfilters             1.5.0
parso                     0.8.3
pathos                    0.3.2
pathspec                  0.12.1
patsy                     0.5.6
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    10.2.0
pip                       24.0
pkgutil_resolve_name      1.3.10
platformdirs              4.2.0
plotly                    5.20.0
portalocker               2.8.2
pox                       0.3.4
ppft                      1.7.6.8
prometheus_client         0.20.0
prompt-toolkit            3.0.42
protobuf                  4.25.3
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
py4j                      0.10.9.5
pyarrow                   15.0.2
pyarrow-hotfix            0.6
pyasn1                    0.5.1
pybind11                  2.11.1
pybind11-global           2.11.1
pycocotools               2.0.7
pycparser                 2.21
pydantic                  2.6.4
pydantic_core             2.16.3
pyfunctional              1.5.0
pygame                    2.5.2
Pygments                  2.17.2
pynvml                    11.5.0
pyparsing                 3.1.2
PySocks                   1.7.1
pyspark                   3.3.0
pyspnego                  0.10.2
python-dateutil           2.9.0
python-json-logger        2.0.7
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     25.1.2
qtconsole                 5.5.1
QtPy                      2.4.1
referencing               0.34.0
regex                     2024.5.10
requests                  2.31.0
requests-kerberos         0.14.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rpds-py                   0.18.0
rsa                       4.7.2
ruamel.yaml               0.18.6
ruamel.yaml.clib          0.2.8
s3fs                      0.4.2
s3transfer                0.10.1
safetensors               0.4.3
sagemaker                 2.217.0
sagemaker_pyspark         1.4.5
schema                    0.7.5
scikit-learn              1.4.1.post1
scipy                     1.12.0
seaborn                   0.13.2
Send2Trash                1.8.2
setuptools                69.2.0
shap                      0.40.0
six                       1.16.0
slicer                    0.0.7
smclarify                 0.5
smdebug-rulesconfig       1.0.1
sniffio                   1.3.1
soupsieve                 2.5
sparkmagic                0.21.0
stack-data                0.6.2
statsmodels               0.14.1
sympy                     1.12
tabulate                  0.9.0
tblib                     3.0.0
tenacity                  8.2.3
tensorboard               2.16.2
tensorboard-data-server   0.7.2
termcolor                 2.4.0
terminado                 0.18.1
threadpoolctl             3.4.0
tinycss2                  1.2.1
tokenizers                0.19.1
tomli                     2.0.1
torch                     2.1.0
torch-model-archiver      0.7.1b20230208
torch-workflow-archiver   0.2.12b20240314
torchaudio                2.1.0
torchdata                 0.7.0
torchserve                0.8.2b20230828
torchtext                 0.16.0
torchvision               0.16.0
tornado                   6.4
tqdm                      4.66.2
traitlets                 5.14.2
transformers              4.40.2
triton                    2.1.0
types-python-dateutil     2.9.0.20240316
typing_extensions         4.10.0
typing-utils              0.1.0
tzdata                    2024.1
ujson                     5.9.0
unicodedata2              15.1.0
uri-template              1.3.0
urllib3                   2.2.1
wcwidth                   0.2.13
webcolors                 1.13
webencodings              0.5.1
websocket-client          1.7.0
Werkzeug                  3.0.1
wheel                     0.43.0
widgetsnbextension        4.0.10
xxhash                    3.4.1
xyzservices               2023.10.1
yacs                      0.1.8
yarl                      1.9.4
zipp                      3.17.0