Streamlining Invoice Classification with LayoutMLv3 and Label-Studio: Simplifying Data Labeling for Precise Results

Hey folks,

I’m diving into a project that involves using LayoutMLv3 for sorting through invoices. Since I’m still getting my feet wet in this area, I could really use some advice. My aim is to pull out specific details from invoices accurately, and to help me with that, I’ve turned to Label-Studio. Here’s what I’ve got planned out:

Step 1: Getting the basic info from the invoices.
Step 2: Teaching the machine learning model to recognize different types of invoices by analyzing their content.

But, here come the questions:

So, if I’ve already got all the data I need for Step 1 (and maybe even for Step 2), can I skip the manual labeling part and just use that data straight up?

Let’s say I’ve got to label around 7-9 things per invoice. Do I only need to mark down the value I’m interested in, or do I have to include the key pair too?

Can I plug LayoutMLv3 directly into Label-Studio so it learns as I label each invoice, or do I have to go through the extra step of converting to JSON and training it with a separate script?

What’s the best way to train the model for Step 2, where it learns to classify invoices based on their content?

Any tips or tricks would be awesome. Thanks a bunch in advance!