Help needed in finetuning pix2struct in DocVQA type dataset

Hi all,

I have created a custom dataset in the format of DocVQA but do not have token id as such. Is there a starter script that can help me fine tune existing DocVQA trained Pix2struct ? A starter script to fine tune the base model to DocVQA should also work.
There is a starter template in the github of pix2struct, wherein they have the codes for various dataset preprocessor, that would not help my cause.