Extracting information from bills, tax statements, etc: What ML model to use?

I have a bunch of documents such as bank statements, utilities bills, personal expenditure invoices, etc. The document types range is very broad. Some of these files are saved as pictures, others as pdfs.

So far, my tactic has been to ocr all the documents, and then use some regexes to extract information (I would like to extract dates, quantities/amounts and entities). However, this hasn’t worked out great so far…

Thus, I was wondering what other possibilities there were in the Machine Learning field.

I’ve searched the Named-Entity-Recognition (NER) deep learning type of models like those in huggingface, but maybe I’m missing some alternatives.

  1. What alternatives are there to NER?
  2. Which NER models have reported good results for this type of task?

Any help would be appreciated.

1 Like

Check out LayoutLM models

1 Like

Thanks for the info :wink: