Hi!
I am working on an image classification task and run into issues that the result of my trainer.predict() and the pipeline(…)results showed huge differences. I could identify the issue is in the image processor.
I am using ConvNeXTV2 which is using the ConvNextImageProcessor. My original input images are of the size 1200x1920px. As there is only relevant information in the center of the image I cropped manually to 1100x600px, resized to 224x224 and used that is input for training and validation. The results look good there.
I am using a 224 type (facebook/convnextv2-tiny-22k-224) and found out that the ConvNextImageProcessor is behaving differently when using 224 or 384 input size.
For shortest_edge=384 the images are just resized as they are which would be my expected behaviour. But for shortest_edge=224 there is more going on. The image is resized dependent on the crop_pct factor and then a square of 224x224 is cropped out and used.
In my case I am losing relevant information and the score goes down massively. Why is the the ConvNextImageProcessor behaving differently depending on the shortest_edge?
Also for training I just resized the images to 224x224. For inference the picture looks completely different when using the ConvNextImageProcessor as it is resizing with a locked ratio and then just cropping.
What is the right approach to handle that? Should I adapt the preprocessing of my training images to fit the ConvNextImageProcessor behaviour? But how should I know what exactly happens in each of the ImageProcessors.
Or should I just use a model that uses shortest_edge=384?
Hope you can help.