Hi, is it possible to train donut for table data extraction and if so how would one build the metadata.jsonl gt_parse to include rowspans and collspans?
I want to extract all rows / columns of all tables in the image.
For example this table:
Is this format allowed or si it a better option to specify if a row/col is spanned over multiple rows/cols?
{
table: [
{
rows: [
[
{ 0: 'Day', 1: 'Seminar', 2: 'Seminar', 3: 'Seminar' },
{ 0: 'Day', 1: 'Schedule', 2: 'Schedule', 3: 'Topic' },
{ 0: 'Day', 1: 'Begin', 2: 'End', 3: 'Topic' },
],
[
{ 0: 'Monday', 1: '8:00 a.m.', 2: '5:00 p.m.', 3: 'Introduction to XML' },
{ 0: 'Monday', 1: '8:00 a.m.', 2: '5:00 p.m.', 3: 'Validity DTD and Relax NG' },
]
],
},
],
...
}
Thanks