Google Document AI Alternative

fakebizprez · September 28, 2024, 6:29pm

I previously developed a custom document processor using Google Document AI for my business. Recently, I’ve established on-premises infrastructure to host our applications. Now, I aim to recreate locally the functionality of our existing processor, which currently parses and extracts information from PDF documents with 100% accuracy.

I’m seeking recommendations for models and components available on Hugging Face that could help achieve this goal. While I’ve experimented with various Python libraries, most were primarily focused on OCR rather than comprehensive document processing. I would greatly appreciate any suggestions, feedback, or advice on this matter.

Thank you for your assistance.

John6666 · September 29, 2024, 6:41am

I don’t know if this fits your application, but this is the most common model I’ve seen lately for OCR.

fakebizprez · October 6, 2024, 10:47am

Thank you, @John6666, I appreciate the response - I know OCR isn’t too exciting.

I just looked into both of these models, and it appears that PDF isn’t supported. I reached out to the developers to confirm.

Thanks again for looking out!

John6666 · October 6, 2024, 10:55am

If you’re using PDF for input, it looks like there are models like this. If you use it for output, maybe it is something that is usually converted manually.

https://mindee.github.io/doctr/index.html

Topic		Replies	Views
Title: Recommendations for Models that Handle Text and Screenshots for QA Models	15	1100	November 7, 2024
Seeking advice on selecting the best OCR model for business card recognition Beginners	4	817	March 6, 2025
Alot of questions, or, How can i run models locally (for an absolute begginger) Beginners	3	55	July 4, 2025
Read data of pdf or just image format as a part of promt Intermediate	0	1336	May 29, 2023
Can someone point me to docs for how to train my own a model? Models	2	621	January 3, 2023

Google Document AI Alternative

Related topics