Fine tuning tips for Pix2Struct DOCVQA

nitinranjansharma · August 1, 2023, 1:06pm

Hi all,

Can someone provide fine tuning tips for pix2struct where multimodal inputs are involved and text decoder as output? I m particularly interested in finetuning pix2struct model using docvqa type dataset.
Thanks in advance for the help.

Topic		Replies	Views
Help needed in finetuning pix2struct in DocVQA type dataset Models	1	399	August 7, 2024
Finetuning Pix2struct with custom Image, QnA form Models	0	397	August 25, 2023
Could not fine-tune deplot model Models	3	483	January 10, 2024
I need some recommendation or advice on a fast vqa (visual question answering) model. I really don't know how to look for them Models	0	83	December 7, 2024
Fine-tunening a multimodal model Beginners	4	4903	December 25, 2024

Fine tuning tips for Pix2Struct DOCVQA

Related topics