If you’re talking about extracting PDFs, the current open source models have reached a good level. I think that models like VLM and VLM combined with LLM, which I introduced below, are quite practical. I also think that there are several VLM models that specialize in PDF OCR, I think you can find various clues by searching for past posts on the forum.
In addition, if you need more specialized advice, I recommend asking a question on HF Discord.