ConvNextImageProcessor weird resize behaviour when input image is 224x224

cloudchief · September 10, 2024, 8:05am

Hi!

I am working on an image classification task and run into issues that the result of my trainer.predict() and the pipeline(…)results showed huge differences. I could identify the issue is in the image processor.

I am using ConvNeXTV2 which is using the ConvNextImageProcessor. My original input images are of the size 1200x1920px. As there is only relevant information in the center of the image I cropped manually to 1100x600px, resized to 224x224 and used that is input for training and validation. The results look good there.

I am using a 224 type (facebook/convnextv2-tiny-22k-224) and found out that the ConvNextImageProcessor is behaving differently when using 224 or 384 input size.

For shortest_edge=384 the images are just resized as they are which would be my expected behaviour. But for shortest_edge=224 there is more going on. The image is resized dependent on the crop_pct factor and then a square of 224x224 is cropped out and used.

In my case I am losing relevant information and the score goes down massively. Why is the the ConvNextImageProcessor behaving differently depending on the shortest_edge?

Also for training I just resized the images to 224x224. For inference the picture looks completely different when using the ConvNextImageProcessor as it is resizing with a locked ratio and then just cropping.

What is the right approach to handle that? Should I adapt the preprocessing of my training images to fit the ConvNextImageProcessor behaviour? But how should I know what exactly happens in each of the ImageProcessors.

Or should I just use a model that uses shortest_edge=384?

Hope you can help.

nielsr · September 10, 2024, 8:20am

Hi,

The ConvNextImageProcessor class replicates the original data transformations during evaluation (source).

If the size of the images is 384 or higher, then the authors decided to square images and normalize them.
If the size of the images is smaller, then they first resize the shorter edge to 0.875*the shorter edge, then perform center cropping, then normalizing.

If you prefer to use the pipeline at inference time, then it’s advised to use the same preprocessing settings as the image processor (which the pipeline uses underneath) during training, so that both align.

cloudchief · September 10, 2024, 9:03am

Hi!

Thanks for the fast reply!

This means, if I don’t want my input images to be cropped I can either prevent it by not using a pipeline and use my own preprocessing, or I use a 384 model. Right?

Currently for training I created my own transforms for train (some augmentation, cropping, resize) and validation (only resize) dataset.

What would be the right approach here? Should I have a look at the code and replicate what is there in my own transforms for training and validation? Or should I probably use the build_transform method that you were referencing in the code? I guess I can’t use the preprocess method that comes with the transformers ConvNextImageProcessor as it does not return a transforms object.

Thank you!

Topic		Replies	Views
Image classification: Why use both a transform and a processor to preprocess images? Beginners	4	146	September 12, 2024
Image size understanding in DinoV2 🤗Transformers	2	3980	December 21, 2023
What is ViTImageProcessor doing? Intermediate	3	1549	April 18, 2024
Get original image from trocr processor Intermediate	1	656	October 10, 2022
Changing resolution of transformer models for training 🤗Transformers	0	645	September 2, 2022

ConvNextImageProcessor weird resize behaviour when input image is 224x224

Related topics