What is ViTImageProcessor doing?

raygx · April 17, 2024, 1:13am

As you can see in the image, (on the right) the image is still fairly understandable after applying normalization. But however, ViTImageProcessor does something totally different. So, I am wondering if anyone can help me understand that. I am trying to finetune ViT with augmentation applied to images in Tensorflow. But, the it is taking 2s per batch (regardless of batch size: 32, 16) because how slow ViTImageProcessor is doing its thing.

So another question that I’ll ask in this thread is, will it work if I apply augmentation(random_brightness, random_contrast, gaussian_noise) to the output of ViTImageProcessor. I am doing this research on my own and don’t have exclusive GPU availability. The GPU time on kaggle only allows 2-3 tests each week. And I can’t always be infront of screen to use Google Colab.

P.S. I have tried normalization with mean=0.5 and std=0.5 as used by VitImageProcessor. Now the question is also about image reconstruction with reshape i.e. (224,224,3) to (3,224,224) and vice-versa.

raygx · April 18, 2024, 1:03am

Ok! I seem to understand what is happening.

The resize method from tensorflow and PIL is working bit differently (even if it was Bilinear in both; how do i know this? After is did do_resize=False I got similar result).
I can’t reconstruct the output from ImageProcessor with np.reshape and tf.reshape. The reshaping method used by ImageProcessor is working differently such that reconstruction is not possible. I did tried all of ‘CFA’ order in np.reshape.

So, (1) is all right, but (2) is going to be a problem isn’t it? As it effects the content of the patch and ultimately the result I may achieve.

So, can anyone help me on this.
@sgugger @amyeroberts @all @everyone @highlight

amyeroberts · April 18, 2024, 12:13pm

Hi @raygx!

If you’re applying augmentation to the images, I’d recommend not using the image processors at all! As you note, they’re pretty slow (something we’re trying to work on) but working directly with tf.image will be a lot easier and faster.

We have an example of training transformers models with tensorflow for image classification here: transformers/examples/tensorflow/image-classification/run_image_classification.py at main · huggingface/transformers · GitHub

The resize method from tensorflow and PIL is working bit differently (even if it was Bilinear in both; how do i know this? After is did do_resize=False I got similar result).

Yes, unfortunately there isn’t a 1:1 correspondence with resizing algorithms across frameworks. As we import models from different frameworks (tf, pt, jax) and the image processors are meant to be agnostic to this, we can’t always resolve the differences.

I can’t reconstruct the output from ImageProcessor with np.reshape and tf.reshape. The reshaping method used by ImageProcessor is working differently such that reconstruction is not possible. I did tried all of ‘CFA’ order in np.reshape.

Could you provide an example of how the image processor is being called and how the outputs are being reshaped?

raygx · April 18, 2024, 1:36pm

@amyeroberts Thanks for your reply! I just figured what is happening.

But just for the record, I am gonna add this.

I was doing Reshape. But It turns out, ImageProcessor is doing Transpose.
The solution I found for myself is:

normalized = ( rescaled_image - 0.5 ) / 0.5 
tf.transpose(normalized , perm=(2,0,1))

which results in

<tf.Tensor: shape=(10,), dtype=float32, numpy=
array([-0.5384153 , -0.56822723, -0.50992393, -0.6224089 , -0.58579427,
       -0.62336934, -0.6540616 ,  0.17835152,  0.34901977,  0.8013607 ],
      dtype=float32)>

Thanks.

Topic		Replies	Views
ViTImageProcessor output visualization 🤗Tokenizers	8	687	April 18, 2024
HuggingFace ViT 10x Slower than Native Tensorflow (Not Fully Using GPU?) 🤗Transformers	0	345	July 16, 2022
ViT produces different embeddings each time? Models	0	274	July 10, 2023
GPU is far slower than CPU for patch embedding 🤗Transformers	0	348	June 8, 2024
How to prevent "filter" in vit.Finetune Beginners	0	200	April 21, 2023

What is ViTImageProcessor doing?

Related topics