Hey there,
a general question on training Document Question Answering models:
During data preprocessing and tokenization, we use some subfinder method to find the ground truth answer in the context.
When we find that answer in the context (OCR’d document files), we call it a match.
Question:
What to do with those samples where we do not find a match?
One can still tokenizes the question, the context and the image, and, given the CLS index is returned as answer, train with that.
Happy to hear your intelligence!