ViTImageProcessor output visualization

Sandy1857 · March 8, 2024, 7:22pm

ViT takes in an input of resolution 224x224. The ViTImageProcessor just handles the resizing and normalisation of the image to that of the correct resolution.

The patches of 16x16 that you mentioned are taken over this processed image which the Vit model then consumes.

And additionally, if a an RGB image is what you gave to the ViTImageProcessor, an RGB image is what you must get with just sizes of 224x224 and 3 channels. Don’t know how you got 6 channel image.

Topic		Replies	Views
What is ViTImageProcessor doing? Intermediate	3	1566	April 18, 2024
Is it possible to train ViT with different number of patches in every batch? (Non-square images dataset) Models	3	3067	May 1, 2024
Fine tuning image transformer on higher resolution Beginners	11	7992	May 1, 2024
'ViTImageProcessor' object has no attribute 'pad' Beginners	4	1867	March 27, 2024
Fine-tuning ViT with more patches/higher resolution Intermediate	3	3671	December 26, 2022

ViTImageProcessor output visualization

Related topics