ViTImageProcessor output visualization

Sushmitaupadhyay · March 7, 2024, 6:59am

when I am printing the output of ViTImageprocessor using Matplotlib I can see the same image is reduces in size and repeated 6 times in a frame of 224x224 with different contrast and intensity.
If I have understood correctly, ViT divides the image into patches. So the entire image should have been divided into small patches of size 16x16. But the result is not like that.
What is the issue here?

Sandy1857 · March 8, 2024, 7:22pm

ViT takes in an input of resolution 224x224. The ViTImageProcessor just handles the resizing and normalisation of the image to that of the correct resolution.

The patches of 16x16 that you mentioned are taken over this processed image which the Vit model then consumes.

And additionally, if a an RGB image is what you gave to the ViTImageProcessor, an RGB image is what you must get with just sizes of 224x224 and 3 channels. Don’t know how you got 6 channel image.

Sushmitaupadhyay · March 12, 2024, 5:58am

This is my input image
output

Sushmitaupadhyay · March 12, 2024, 6:00am

This is my output after passing it through the ViTImageProcessor
output1
I am visualising the processed image using matplotlib. Is that the issue?

Sandy1857 · March 12, 2024, 4:57pm

I don’t know what code you ran, but could you just post the output pixel_values after passing the image through ViTImageProcessor, removing/unsqeezing it’s batch dimension, and then get the image using PIL Image.fromarray function?

nielsr · March 12, 2024, 7:48pm

Hi,

Here’s how you can visualize the output of ViTImageProcessor:

from transformers import ViTImageProcessor
import requests
from PIL import Image

image_processor = ViTImageProcessor()

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

pixel_values = image_processor(image, return_tensors="pt").pixel_values

# denormalize the pixel values for visualization purposes
mean = image_processor.image_mean
std = image_processor.image_std

unnormalized_image = (pixel_values[0].numpy() * np.array(std)[:, None, None]) + np.array(mean)[:, None, None]
unnormalized_image = (unnormalized_image * 255).astype(np.uint8)
unnormalized_image = np.moveaxis(unnormalized_image, 0, -1)
unnormalized_image = Image.fromarray(unnormalized_image)

which gives me this:

This is a 224x224 image.

Sushmitaupadhyay · March 20, 2024, 5:15am

This is how I have run the code

Sushmitaupadhyay · March 20, 2024, 5:17am

This output you get after un normalizing. My question is after passing through the ViTImageProcessor why the image gets small and gets arranged in patches. You can refer the output which I have pasted.

raygx · April 18, 2024, 10:15pm

I also came across this very question. And finally I figured out what was happening.
What is ViTImageProcessor doing? - #4 by raygx.
Check the last reply that I gave. You’ll know how to reconstruct the image.
@everyone

Topic		Replies	Views
What is ViTImageProcessor doing? Intermediate	3	1528	April 18, 2024
SAMModel output size different to the input Intermediate	2	233	June 6, 2024
ViT produces different embeddings each time? Models	0	274	July 10, 2023
Is it possible to train ViT with different number of patches in every batch? (Non-square images dataset) Models	3	3005	May 1, 2024
Serious issue regarding channel dimensions with respect to configuration during training a vision transformer Beginners	2	513	August 26, 2024

ViTImageProcessor output visualization

Related topics