Hi @sinchir0
Deplot is a VQA model, so you need to render a question or a specific task directly on the image as the snippet here: google/deplot · Hugging Face
This is different from image captioning task where the input is image only, and you’re trying to predict a caption given that image
hi @sinchir0
I think that would work yes! To double check I would also ask the authors directly by opening an issue in the Hub repo of deplot
I will also ask the authors on my side and let you know!
Thanks!
Hey @sinchir0, thanks for looking into deplot! I’m an author.
Yes indeed we always rendered the header “Generate underlying data table of the figure below:” during deplot training. Though in this case I imagine sending just an empty string would also work.
@ybelkada@fl399
Thank you! My understanding is now very clear. Now I can proceed with the fine-tuning of deplot!
To double check I would also ask the authors directly by opening an issue in the Hub repo of deplot
You’re right, I should have asked in the Hub issue.
I have posted an additional question on the Hub issue about the format of the input data table for fine-tuning deplot, if you would like to check it out.