LayoutLM training - use unmatched samples?

weplayrockandroll · September 4, 2024, 10:22am

Hey there,

a general question on training Document Question Answering models:
During data preprocessing and tokenization, we use some subfinder method to find the ground truth answer in the context.

When we find that answer in the context (OCR’d document files), we call it a match.

Question:
What to do with those samples where we do not find a match?
One can still tokenizes the question, the context and the image, and, given the CLS index is returned as answer, train with that.

Happy to hear your intelligence!

Topic		Replies	Views
Training a model for a PDF with OCR - where to begin? Beginners	4	10620	October 27, 2024
(Distributed Training) KeyError: eval_f1 in QuestionAnsweringTrainer taken from trainer_qa.py in examples 🤗Transformers	1	1192	June 22, 2023
XLM Roberta train for questions answering Beginners	2	341	November 23, 2023
How is the prompt + answer handled during training Beginners	0	112	March 20, 2024
Simple generative question answering with context Beginners	5	2786	August 16, 2024

LayoutLM training - use unmatched samples?

Related topics