What is ViTImageProcessor doing?

amyeroberts · April 18, 2024, 12:13pm

If you’re applying augmentation to the images, I’d recommend not using the image processors at all! As you note, they’re pretty slow (something we’re trying to work on) but working directly with tf.image will be a lot easier and faster.

We have an example of training transformers models with tensorflow for image classification here: transformers/examples/tensorflow/image-classification/run_image_classification.py at main · huggingface/transformers · GitHub

The resize method from tensorflow and PIL is working bit differently (even if it was Bilinear in both; how do i know this? After is did do_resize=False I got similar result).

Yes, unfortunately there isn’t a 1:1 correspondence with resizing algorithms across frameworks. As we import models from different frameworks (tf, pt, jax) and the image processors are meant to be agnostic to this, we can’t always resolve the differences.

I can’t reconstruct the output from ImageProcessor with np.reshape and tf.reshape. The reshaping method used by ImageProcessor is working differently such that reconstruction is not possible. I did tried all of ‘CFA’ order in np.reshape.

Could you provide an example of how the image processor is being called and how the outputs are being reshaped?

Topic		Replies	Views
ViTImageProcessor output visualization 🤗Tokenizers	8	712	April 18, 2024
Image classification: Why use both a transform and a processor to preprocess images? Beginners	4	150	September 12, 2024
ConvNextImageProcessor weird resize behaviour when input image is 224x224 🤗Transformers	2	49	September 10, 2024
Fine tuning image transformer on higher resolution Beginners	11	8042	May 1, 2024
'ViTImageProcessor' object has no attribute 'pad' Beginners	4	1875	March 27, 2024

What is ViTImageProcessor doing?

Related topics