Owl-vit inference quality metrics

I’m successully running an inference on a set of images by using Owl-Vit. I get a lot of Confidence prediction among images.

Question: is there a way to evaluate general performance of prediction like AP metric or something else?

Thanks.