Google Document AI Alternative

I previously developed a custom document processor using Google Document AI for my business. Recently, I’ve established on-premises infrastructure to host our applications. Now, I aim to recreate locally the functionality of our existing processor, which currently parses and extracts information from PDF documents with 100% accuracy.

I’m seeking recommendations for models and components available on Hugging Face that could help achieve this goal. While I’ve experimented with various Python libraries, most were primarily focused on OCR rather than comprehensive document processing. I would greatly appreciate any suggestions, feedback, or advice on this matter.

Thank you for your assistance.

1 Like

I don’t know if this fits your application, but this is the most common model I’ve seen lately for OCR.

1 Like

Thank you, @John6666, I appreciate the response - I know OCR isn’t too exciting.

I just looked into both of these models, and it appears that PDF isn’t supported. I reached out to the developers to confirm.

Thanks again for looking out!

1 Like

If you’re using PDF for input, it looks like there are models like this. If you use it for output, maybe it is something that is usually converted manually.

https://mindee.github.io/doctr/index.html