Deplot is a VQA model, so you need to render a question or a specific task directly on the image as the snippet here: google/deplot · Hugging Face
This is different from image captioning task where the input is image only, and you’re trying to predict a caption given that image
I think that would work yes! To double check I would also ask the authors directly by opening an issue in the Hub repo of deplot
I will also ask the authors on my side and let you know!
Hey @sinchir0, thanks for looking into deplot! I’m an author.
Yes indeed we always rendered the header “Generate underlying data table of the figure below:” during deplot training. Though in this case I imagine sending just an empty string would also work.