What is ViTImageProcessor doing?

Hi @raygx!

If you’re applying augmentation to the images, I’d recommend not using the image processors at all! As you note, they’re pretty slow (something we’re trying to work on) but working directly with tf.image will be a lot easier and faster.

We have an example of training transformers models with tensorflow for image classification here: transformers/examples/tensorflow/image-classification/run_image_classification.py at main · huggingface/transformers · GitHub

  1. The resize method from tensorflow and PIL is working bit differently (even if it was Bilinear in both; how do i know this? After is did do_resize=False I got similar result).

Yes, unfortunately there isn’t a 1:1 correspondence with resizing algorithms across frameworks. As we import models from different frameworks (tf, pt, jax) and the image processors are meant to be agnostic to this, we can’t always resolve the differences.

  1. I can’t reconstruct the output from ImageProcessor with np.reshape and tf.reshape. The reshaping method used by ImageProcessor is working differently such that reconstruction is not possible. I did tried all of ‘CFA’ order in np.reshape.

Could you provide an example of how the image processor is being called and how the outputs are being reshaped?