The model doesn´t classify the tokens well. However, If i pass the bboxes and text from my annotations it works properly.
How is this model supposed to be used for inference? Do you need to pass the hand-drawn bboxes and text?
I want to use this model to extract information automatically and if I have to pass these annotations manually it makes no sense.
Maybe I did something wrong at labelling? Should I run the image through tesseract and then label all the bboxes it returns instead of drawing them by hand?
Thanks for your interest in LayoutLMv3. That labelling tool likes nice!
I’d say that you need to make sure that the OCR settings between training and inference should be identical, otherwise the model will not work as expected at inference time.
e.g. are you making sure bounding boxes are provided in the appropriate format during training? I’d check things such as:
Each bounding box should be a normalized version in (x0, y0, x1, y1) format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the position of the lower right corner.
So make sure to provide the same settings for the bounding boxes during training vs inference, make sure you provide them in the right order (from top left in the document to bottom right), etc.
Hi @WaterKnight, For token classification other than the labels we need to classify, dont we need one more label named “others”.
So when you used labelstudio, did you select the text that belongs only to those labels we need to classify or you also selected the text belongs to “Others” label as well.
I have the same question as @purnasai . When I run train and test on custom dataset, the model performs nicely on test; however, when I follow inference guide using an image very similar to a test image, the model performs poorly. The image output on inference has many “other” bounding boxes and the two classes of labels found, are wrong. I did not annotate “other” labels during labeling and annotating; only used two classes of labels, which are inferred nicely on test set.
When labeling training images, do I need to identify the words I’m not interested in as “other” along with the other two classes that I am interested in?