I’m running Owl-v2 over a dataset having 30k images and 75 classes.
The same dataset ran well using previous Owl-vit version. With the new version, I’m experiencing some disalignment on the bounding box detected. I attach an example. On the left, Owl-vit 1, On the right Owl-vit 2. Same problem over thousands of images. (See boat and cloud bb on the top right for example.)
This deeply affects the quality of results, obviously. Very strange. Someone detect same problem? Thanks.