ConvNextImageProcessor weird resize behaviour when input image is 224x224

Hi!

I am working on an image classification task and run into issues that the result of my trainer.predict() and the pipeline(…)results showed huge differences. I could identify the issue is in the image processor.

I am using ConvNeXTV2 which is using the ConvNextImageProcessor. My original input images are of the size 1200x1920px. As there is only relevant information in the center of the image I cropped manually to 1100x600px, resized to 224x224 and used that is input for training and validation. The results look good there.

I am using a 224 type (facebook/convnextv2-tiny-22k-224) and found out that the ConvNextImageProcessor is behaving differently when using 224 or 384 input size.

For shortest_edge=384 the images are just resized as they are which would be my expected behaviour. But for shortest_edge=224 there is more going on. The image is resized dependent on the crop_pct factor and then a square of 224x224 is cropped out and used.

In my case I am losing relevant information and the score goes down massively. Why is the the ConvNextImageProcessor behaving differently depending on the shortest_edge?

Also for training I just resized the images to 224x224. For inference the picture looks completely different when using the ConvNextImageProcessor as it is resizing with a locked ratio and then just cropping.

What is the right approach to handle that? Should I adapt the preprocessing of my training images to fit the ConvNextImageProcessor behaviour? But how should I know what exactly happens in each of the ImageProcessors.

Or should I just use a model that uses shortest_edge=384?

Hope you can help.