SegFormer image segmentation inference do depends of resolution

I have quite a abnormal behavior for SegFormer inference.

It appears that, sometimes, a better segmentation can be obtained when the resolution (image size, not the Ground Sampling Distance -real physical distance between 2 px-) was divided by two (2) for example. But this type of architecture is, however, known to be invariant to these resolution issues through the hierarchical encoder part, which ensures feature extraction through resolution changes. So why ?

I thought that by reducing the image -for a same input patch size of 512x512- we had more context and less focus on local details. But sometimes it is the contrary. I’m lost.

The fact is that I would like to automate the process, so how should I know when to reduce the resolution ? I’ve tried to deal with edges computation, to see if it is caused by numbers of details, but not successful.

1 Like