Multiple and Inaccurate bboxes after finetuning DETR

sars973 · March 14, 2025, 1:37pm

I followed the Object Detection guide to fine-tune a DETR model. However, I am encountering an issue where the model is detecting the same objects multiple times, leading to redundant bounding boxes. Additionally, some of the detected objects are inaccurate, either misclassified or poorly localized. This affects the overall quality of the object detection results, making it difficult to integrate the outputs effectively for downstream tasks such as image captioning. Thanks for helping!!!

Notebook link: [Google Colab] (Google Colab)

Example training image:

John6666 · March 14, 2025, 3:37pm

It seems to be a common problem…

The issue you are encountering with the DETR model detecting the same objects multiple times and producing inaccurate detections is a common challenge in object detection tasks. The problem likely stems from how the model is trained and the post-processing steps used during inference. Below are some potential solutions and insights based on the provided sources:

Understanding the Problem: DETR predicts a fixed number of bounding boxes (set by num_queries) for each image. If the number of queries is higher than the number of actual objects in the image, the model may produce redundant or overlapping bounding boxes. This is a known issue in DETR, especially for datasets with fewer objects per image [1][2].
Post-Processing with NMS: DETR does not inherently include Non-Maximum Suppression (NMS) during inference, which is a common post-processing step in object detection to remove redundant bounding boxes. Implementing NMS after the model’s predictions can help retain only the highest confidence bounding box for the same object [2].
Hungarian Algorithm and Matching: During training, DETR uses the Hungarian algorithm to match predicted boxes to ground truth boxes. However, this process is not applied during inference. Ensuring that the model is trained with a robust cost function for matching can help reduce redundant detections [2].
Training for Longer Periods: Increasing the number of training epochs can improve the model’s ability to distinguish between true objects and false positives. The model’s performance can significantly improve with more training, as seen in similar cases [1].
Adjusting num_queries: If your dataset typically contains fewer objects per image, reducing the num_queries parameter (from the default 100) to a value closer to the maximum number of objects in your dataset can help reduce redundant predictions [1].
Improving Localization and Classification: DETR’s performance on small objects can degrade, but it is generally effective for most object detection tasks. Ensuring that the model is properly fine-tuned on your dataset can improve both localization and classification accuracy [3].
Alternative Approaches: If the issue persists, consider exploring variations of DETR, such as NAN-DETR, which introduces improvements like a multi-anchor strategy and a centralization noising mechanism to enhance detection accuracy and reduce redundant detections [4].

Given these insights, I recommend adjusting the num_queries parameter, implementing NMS during inference, and potentially extending the training period to improve the model’s performance. If you continue to face issues, exploring advanced versions of DETR, like NAN-DETR, might provide better results.

For additional guidance, refer to the following sources for details on DETR’s architecture, training process, and potential improvements: [1][2][3][4].

Topic		Replies	Views
Inaccurate bboxes after finetuning DETR Beginners	9	395	October 6, 2024
Example DeTr Object Detectors not predicting after fine tuning Beginners	6	1383	May 9, 2024
Is it possible mAP accuracy detr during training? 🤗Transformers	1	444	November 13, 2024
Possible fix for trainer evaluation with object detection 🤗Transformers	0	316	February 7, 2024
How to teach DETR to detect only BBox for one specific object, without classification 🤗Transformers	3	179	November 14, 2024

Multiple and Inaccurate bboxes after finetuning DETR

Related topics