Model Recommendation for table extraction from PDF


Could you please recommend model which would be able to extract tables from attached pdf?

I need to be able to extract table which is underneath red horizontal line (line normally doesn’t exist in PDF, I’ve added it to the print screen).

The model should be able to extract said table which will be always in the same format. The only difference is that in various PDF in can be positioned differently in terms of distance from the edge of the page (so basically size of the margins.)

This table is normally completed in word and then converted to pdf and sometimes user will move entire table a little bit up/down or left/right. But as said the format of the table, the headers (Reference(s), Current year, Previous year) and the rows will be always the same.

I converted pdf to image and then tried Donut model but without success. As you can see on the print screen each table is accompanies with field reference and I need to extract those too so each field can be identified.

Thank you in advance for help

1 Like