Hello All,
I want to do token classification on some PDFs. I converted them to an image and annotated it as it is Insurance document.
I have used labelbox for annotation. My sample annotation look like this
[
{
"data_row": {
"id": "example123",
"external_id": "SampleDocument_page_1.png",
"row_data": "https://example.com/document/page1.png"
},
"media_attributes": {
"height": 1700,
"width": 2200,
"mime_type": "image/png",
"exif_rotation": "1"
},
"projects": {
"project123": {
"name": "Sample Project",
"labels": [
{
"id": "label123",
"annotations": {
"objects": [
{
"feature_id": "feature123",
"name": "Sample Text 1",
"bounding_box": {
"top": 100,
"left": 150,
"height": 50,
"width": 200
}
},
{
"feature_id": "feature124",
"name": "Sample Text 2",
"bounding_box": {
"top": 200,
"left": 250,
"height": 50,
"width": 200
}
}
// ... more objects
]
// ... possibly more annotation types
}
}
// ... possibly more labels
]
}
// ... possibly more projects
}
// ... possibly more top-level keys
}
// ... possibly more data rows
]
I want to use Layoutlmv3. I have been using sagemaker to train it but keep running into errors of the type of boxes. Can someone suggest me a good way to move forward?