Seeking Recommendations for the Best Open Source Paper-to-Text Conversion Methods

yachty66 · May 16, 2023, 3:12am

I’m currently searching for the most effective method to convert academic papers (in PDF format) into text, with a focus on open-source solutions. So far, I’ve tried Mathpix, which is fairly impressive and offers a markdown conversion, effectively turning formulas into LaTeX. However, it isn’t open-source.

One open-source OCR option I am aware of is EasyOCR. Additionally, I noticed that Pix2Struct has been recently made available on Hugging Face. I’m open to any suggestions or recommendations. What would you consider the best open-source tool or method for converting a paper into text?

Topic		Replies	Views
Open-source LLMs and tools for scientific PDFs data extraction and to MD conversion Models	0	470	June 18, 2024
OCR model suggestion 🤗Transformers	0	880	March 21, 2024
Texo: An in-browser LaTeX OCR model built on Transformers and Transformers.js Intermediate	0	89	October 28, 2025
Google Document AI Alternative 🤗Transformers	3	1228	October 6, 2024
Complex OCR scenarios Models	1	123	April 4, 2025

Seeking Recommendations for the Best Open Source Paper-to-Text Conversion Methods

Related topics