How to represent paginated documents as a single instance of training data for whole document classification?

pierreguillou · May 23, 2022, 6:04pm

I have the same objective as you: classification of multi-page image documents (for example, PDF documents whose pages can be converted to images) by using - at the same time - both the layout and text.

@nielsr of HuggingFace works on Document Image Classification (see his github) but I did not find in this work a notebook/script that classifies from all the document pages.

DiT (paper):
- performing inference with DiT for document image classification
LayoutLM (paper):
- fine-tuning LayoutLMForSequenceClassification on the RVL-CDIP dataset
LayoutLMv2 (paper):
- fine-tuning LayoutLMv2ForSequenceClassification on RVL-CDIP

About using LayoutLMv2 for one page document classification, there is also the publication of Karndeep Singh that looks similar to the one of @nielsr

I searched as well in arxiv.org about whole document classification and I found this paper “Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning” which was updated in January 2022 (see image below). Unfortunately, I did not find any associated code/notebook.

Back to LayoutLMv2 (or LayoutLMv3 now), how do you think we could use it for a multi-page document classification? @nielsr, have you already worked on thus subject? Thanks.

Topic		Replies	Views
How to represent paginated documents as a single training data instance 🤗Transformers	2	605	May 16, 2022
Multi-input classification (images + Texts) Beginners	6	1115	February 18, 2024
I have trained my classifier, now how do I do predictions? Beginners	7	40693	February 14, 2021
HF Datasets best practices 🤗Datasets	0	319	October 14, 2023
Model inference on tokenized dataset 🤗Datasets	2	6160	March 22, 2023

How to represent paginated documents as a single instance of training data for whole document classification?

Related topics