I’m making my own llm and I’m very interested in how to parse documents. I want to get the information I want by parsing various pdf or docx files, but the text is fine. But the image and table information were difficult. Is there a topic to update sota in the community or paper with code that gets information about this topic?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
I need your opinion about Metadata Extraction | 0 | 241 | March 27, 2024 | |
LLM model for table data | 8 | 39446 | July 21, 2024 | |
Embedding structured data | 0 | 345 | May 19, 2024 | |
Fine-Tuning a Language Model with Data Extracted from Multiple PDFs for a Chat Interface | 2 | 2295 | November 5, 2024 | |
New on the plattform, need help with document parser tool | 1 | 27 | December 13, 2024 |