Model Recommendation for table extraction from PDF

DoubleCortado · October 26, 2023, 10:40am

Hello.

Could you please recommend model which would be able to extract tables from attached pdf?

I need to be able to extract table which is underneath red horizontal line (line normally doesn’t exist in PDF, I’ve added it to the print screen).

The model should be able to extract said table which will be always in the same format. The only difference is that in various PDF in can be positioned differently in terms of distance from the edge of the page (so basically size of the margins.)

This table is normally completed in word and then converted to pdf and sometimes user will move entire table a little bit up/down or left/right. But as said the format of the table, the headers (Reference(s), Current year, Previous year) and the rows will be always the same.

I converted pdf to image and then tried Donut model but without success. As you can see on the print screen each table is accompanies with field reference and I need to extract those too so each field can be identified.

Thank you in advance for help

aryanbaghla · June 23, 2024, 5:46am

did you find any better solution for this?

ommmmo · June 27, 2024, 1:09am

You can try this model?

I created a tool to help user find the model and dataset based on their specific request. There are other recommendations as well, maybe you can give it a try? www.modelmatch.ai

pasqualeb · July 14, 2024, 8:20am

I tried modelmatch.io and it’s a really great tool! Great job!

Topic		Replies	Views
How can I extract a table from a PDF text doc? Beginners	0	544	April 24, 2024
Reading PDF tables in PDF's with different languages and layouts Beginners	0	1207	February 8, 2024
I need a model for requirements extraction Models	5	278	March 31, 2025
LLM model for table data Languages at Hugging Face	8	41312	July 21, 2024
Extraction of tabular data from a PDF Beginners	0	69	May 6, 2025

Model Recommendation for table extraction from PDF

Related topics