But from my understanding, the deplot pretraining render the input text “generate the underlying data” as a header to the input images. If we pass the processor.image_processor.is_vqa = False
, the image wouldn’t get preprocessed correctly since the render_header
will not be called. Therefore, leading to the incorrect result of input process.
So, I don’t think this is a proper solution. Still exploring the correct solutions, will post later