I’m making my own llm and I’m very interested in how to parse documents. I want to get the information I want by parsing various pdf or docx files, but the text is fine. But the image and table information were difficult. Is there a topic to update sota in the community or paper with code that gets information about this topic?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
I need your opinion about Metadata Extraction | 0 | 259 | March 27, 2024 | |
Extraction of tabular data from a PDF | 0 | 57 | May 6, 2025 | |
LLM model for table data | 8 | 41067 | July 21, 2024 | |
Embedding structured data | 0 | 387 | May 19, 2024 | |
Fine-Tuning a Language Model with Data Extracted from Multiple PDFs for a Chat Interface | 2 | 2591 | November 5, 2024 |