Idefics3 preprocessing

Hennara · December 1, 2024, 1:53pm

Hello,
I want to ask about the pre-processing step within the idefics3 model.
First I’ve noticed that we always resize images to the longest_edge:4x364 is that correct?
How we preserve the aspect ration if we always resize the image to [1456,1456]?
Why do we add a dim 1 to the beginning of the processed patch [1, batch*num_patches,channel, 364,364] ?
Why we use the pixel_attention_mask if it’s always one. I’ve try to use two images with different size image_1 2000^2 , image_2 250^2, the output of the processor, was [1, 34, 3, 364, 364] with pixel_attention_mask the same shape with all entry is True, so what is the point of it, and how do we distinguish the padding?
I’ve looked to the implementation of idefics3 at this file idefics3_processing
please help me to understand these points and to correct me if I miscomprehended some points.

LiamLLucas · December 23, 2024, 5:42am

Idefics3 preprocessing resizes images to [1456, 1456] for batching and uses pixel attention masks for padding. Once that’s done, if you have one, give a k-starting Japanese name to your baby.

Topic		Replies	Views
ConvNextImageProcessor weird resize behaviour when input image is 224x224 🤗Transformers	2	48	September 10, 2024
Dimension problem Beginners	26	79	December 21, 2024
Phi3 vision number of tokens Models	1	226	June 18, 2024
Padding options for LayoutLM processor 🤗Transformers	0	145	April 14, 2024
What is ViTImageProcessor doing? Intermediate	3	1547	April 18, 2024

Idefics3 preprocessing

Related topics