族谱修复整理·Genealogy repair maintenance

mmhzlrj · October 14, 2023, 9:39am

数据集（dataset）：mmhzlrj/Genealogy
from datasets import load_dataset
dataset = load_dataset(“mmhzlrj/Genealogy”)

您好！我是一位AI的初学者，了解到layoutlmv3是处理NLP的一个非常强大的多模态大模型，希望使用它做一件非常有意义的事情。但是我不懂如何使用这个模型微调和识别图片来完成我想要实现的族谱修复整理任务：

识别族谱的文字排版，将扫描版的图片转化成可以选择文字的PDF
识别内容，生成以人物卡（姓名：生-死，藏地，学历，子嗣，事迹等一切识别出来的标签内容）
由人物卡连接成树的图形化的家族树

Hello! I am a beginner in AI and have learned that layoutlmv3 is a very powerful multimodal model for handling NLP. I hope to use it to do something very meaningful. But I don’t know how to use this model to fine tune and recognize images to complete the Genealogy repair maintenance.

To do list:

Recognize the text layout of the genealogy and convert scanned images into PDF with selectable text
Identify the content and generate tag content with character cards (name: DOB-DOD, burial site, educational background, descendants, events, etc.)
A graphical family tree connected by character cards into a tree

Thanks

Sample:

Topic		Replies	Views
LayoutLMV3 on dataset other than english Beginners	0	202	October 8, 2023
Image Token classification LayoutLMv3 Beginners	0	354	November 7, 2023
How to create a dataset for LayoutLMV3 Beginners	0	16	February 10, 2025
Optimal Approach for Fine-Tuning LayoutLMv3 for Token Classification with 80 Labels Models	3	31	May 26, 2025
LayoutLMV3 information extraction from invoice Awesome paper	2	993	September 22, 2024

族谱修复整理·Genealogy repair maintenance

Related topics