I previously developed a custom document processor using Google Document AI for my business. Recently, I’ve established on-premises infrastructure to host our applications. Now, I aim to recreate locally the functionality of our existing processor, which currently parses and extracts information from PDF documents with 100% accuracy.
I’m seeking recommendations for models and components available on Hugging Face that could help achieve this goal. While I’ve experimented with various Python libraries, most were primarily focused on OCR rather than comprehensive document processing. I would greatly appreciate any suggestions, feedback, or advice on this matter.
Thank you for your assistance.