Fine-tunning donut for full table data extraction

wyzixg · February 26, 2023, 11:52am

Hi, is it possible to train donut for table data extraction and if so how would one build the metadata.jsonl gt_parse to include rowspans and collspans?
I want to extract all rows / columns of all tables in the image.

For example this table:

Is this format allowed or si it a better option to specify if a row/col is spanned over multiple rows/cols?

{
 table: [
  {
   rows: [
    [
     { 0: 'Day', 1: 'Seminar', 2: 'Seminar', 3: 'Seminar' },
     { 0: 'Day', 1: 'Schedule', 2: 'Schedule', 3: 'Topic' },
     { 0: 'Day', 1: 'Begin', 2: 'End', 3: 'Topic' },
    ],
    [
     { 0: 'Monday', 1: '8:00 a.m.', 2: '5:00 p.m.', 3: 'Introduction to XML' },
     { 0: 'Monday', 1: '8:00 a.m.', 2: '5:00 p.m.', 3: 'Validity DTD and Relax NG' },
    ]
   ],
  },
 ],
...
}

Thanks

LukSky · April 12, 2023, 12:40pm

Hi, did you do this model? How it looks like? What is the accurace for tables? I search about it and find that for tables is good to put bounding box with the cell values for donut dataset, but I didnt check it. Maybe you have some expierence right now?

wyzixg · April 12, 2023, 1:07pm

I’ve done some trials using PubTables-1M using a structure like this:
{“file_name”:“PMC1064074_table_0.jpg”,“ground_truth”:"{"gt_parse":{"cells":[{"row_0_col_0":"Kinetic parameter"},{"row_0_col_1":"ND"},{"row_0_col_2":"D"},{"row_0_col_3":"D + dn-RhoA"},{"row_0_col_4":"D + dn-Rac1"},{"row_1_col_0":"Vmax"},{"row_1_col_1":"19.6 ± 0.75"},{"row_1_col_2":"26.2 ± 0.86*"},{"row_1_col_3":"31.3 ± 0.88†"},{"row_1_col_4":"21.6 ± 0.9"},{"row_2_col_0":"K’ for H+"},{"row_2_col_1":"0.150 ± 0.02"},{"row_2_col_2":"0.113 ± 0.05†"},{"row_2_col_3":"0.105 ± 0.07†"},{"row_2_col_4":"0.137 ± 0.023"},

It works on images that only contains the table or extracting the tables from the image and running donut only on that, but i’m not happy with the accuracy and i get wrong cell coordinates for tables that contains rowspan or colspan, i’m still working on it…
Also i only trained it on 10k images, will try training it on full dataset but our GPU’s are busy on another project now…

A friend of mine had better results using pero-ocr, extracting words coordinates and getting the cells col and row by coordinates and building a json.
It’s still a WIP…

ashishu6543 · May 19, 2023, 11:59am

Did anyone try to do this? I am eager to know if we can create the dataset for full table extraction. If yes then what will be the proper format even if the cells are spanned over multiple columns and rows?

Topic		Replies	Views
Donut fine tuning question 🤗Optimum	0	1630	October 16, 2023
Donut base-sized model, pre-trained only for a new language tutorial Models	2	1051	February 19, 2023
Creating a docvqa dataset - gt_parses Intermediate	1	554	June 11, 2024
Model Recommendation for table extraction from PDF Models	3	3951	July 14, 2024
[DONUT] Typo errors - Document parsing 🤗Transformers	1	522	September 10, 2024

Fine-tunning donut for full table data extraction

Related topics