Creating a docvqa dataset - gt_parses

samitizerxu · January 30, 2023, 2:10pm

@nielsr I’m currently going through your donut vqa fine-tuning experiment, and I’m confused about the gt_parses column.

What is the correct format for multiple questions for each answer? Your tutorial shows the formatting for multiple answers for each question but not the converse.
I noticed that in your docvqa_1200_examples_donut dataset, most rows only have one ground truth in the gt_parses list. Moreover, rows that have more than one ground truth actually have the question and answer repeated. Is this by design, or is there something wrong?

Thanks!

nielsr · June 11, 2024, 5:16pm

Hi,

The only reason they created gt_parses (rather than gt_parse) is because images in the DocVQA dataset have multiple annotations (as multiple human annotators were used to create question-answer pairs for each image).

Regarding rows having the same question and answer repeated, that might be an issue with this specific dataset, it definitely shouldn’t be like that.

Topic		Replies	Views
Question-Answering/Text-generation/Summarizing: Fine-tune on multiple answers Beginners	8	5278	November 20, 2021
Fine-tunning donut for full table data extraction Models	3	2166	May 19, 2023
Donut - DOC QA - Training the model to say "Answer not found" Beginners	0	219	August 30, 2023
Creating Donut Dataset Beginners	2	603	June 12, 2024
How to format a dataset for question/answers text for fine Beginners	0	967	December 13, 2023

Creating a docvqa dataset - gt_parses

Related topics