What bounding boxes format does Grounding DINO use?

I think it will be converted internally if you pass it in this format.

The image_processor expects the annotations to be in the following format: {'image_id': int, 'annotations': list[Dict]}, where each dictionary is a COCO object annotation.