ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got

But from my understanding, the deplot pretraining render the input text “generate the underlying data” as a header to the input images. If we pass the processor.image_processor.is_vqa = False, the image wouldn’t get preprocessed correctly since the render_header will not be called. Therefore, leading to the incorrect result of input process.

So, I don’t think this is a proper solution. Still exploring the correct solutions, will post later