I’m currently searching for the most effective method to convert academic papers (in PDF format) into text, with a focus on open-source solutions. So far, I’ve tried Mathpix, which is fairly impressive and offers a markdown conversion, effectively turning formulas into LaTeX. However, it isn’t open-source.
One open-source OCR option I am aware of is EasyOCR. Additionally, I noticed that Pix2Struct has been recently made available on Hugging Face. I’m open to any suggestions or recommendations. What would you consider the best open-source tool or method for converting a paper into text?