Extracting information from bills, tax statements, etc: What ML model to use?

anoldmaninthesea · April 9, 2022, 10:40am

I have a bunch of documents such as bank statements, utilities bills, personal expenditure invoices, etc. The document types range is very broad. Some of these files are saved as pictures, others as pdfs.

So far, my tactic has been to ocr all the documents, and then use some regexes to extract information (I would like to extract dates, quantities/amounts and entities). However, this hasn’t worked out great so far…

Thus, I was wondering what other possibilities there were in the Machine Learning field.

I’ve searched the Named-Entity-Recognition (NER) deep learning type of models like those in huggingface, but maybe I’m missing some alternatives.

What alternatives are there to NER?
Which NER models have reported good results for this type of task?

Any help would be appreciated.

mrm8488 · April 22, 2022, 10:48pm

Check out LayoutLM models

anoldmaninthesea · April 23, 2022, 12:07pm

Thanks for the info

MbarkiDAzizAI · August 28, 2024, 1:07pm

ckeck spacy ner model, i can help u on that!

Topic		Replies	Views
Cost of Tax receipt recognition OCR vs. LLM Models	2	178	March 22, 2025
Seeking Guidance on Extracting Bidding Data from Procurement Documents Beginners	0	170	April 23, 2024
Best route for text extraction from Invoice documents Beginners	2	827	January 11, 2025
Suggestions on Invoice Extraction LLMs Models	2	587	April 19, 2024
Best model to extract text from old Church records written in cursive? Models	2	38	May 18, 2025

Extracting information from bills, tax statements, etc: What ML model to use?

Related topics