Combining Multi-Output Image Classification with Object Detection for Environment Inspection

ThirzaOPPO258 · August 15, 2025, 7:39am

Hi all,

I’m working on a computer vision project for environment inspection.
The application takes a photo from a fixed, specified angle so dataset consistency is maintained, and the dataset itself also follows this angle requirement.

Objective:

First, determine which rule set applies to that photo (e.g., Rule 1, Rule 2, etc.).
Then, determine if the environment is Good or Not Good according to that rule.
If Not Good, detect and locate the objects that cause the violation.

Workflow I’m considering:

Step 1 — Image Classification:
- Multi-class classification for the rule set (Rule 1, Rule 2, etc.).
- Binary classification for Good vs Not Good (could be another output head of the same model).
Step 2 — Object Detection (only if Not Good):
- Detect items that cause the problem, based on the specific rule set identified earlier.

What I’ve tried so far:

This was before I tried to negotiate, because the dataset at that time was very varied. But now I have access to create a new dataset, and I’m still confused about how to combine them.

Image Classification (ConvNeXt)
- Built a binary classification dataset: Good / Not Good.
- Training loss and accuracy looked good.
- Result:
  - Works in some cases (e.g., empty table = Good).
  - Fails in others (e.g., full table that’s still acceptable = classified as Not Good).
  - Seems to overfit to simple visual cues like “clutter = bad.”
Object Detection (YOLO)
- Labeled Not Good examples with bounding boxes showing the issues.
- Trained YOLO to only detect Not Good objects (no detection = Good).
- Result:
  - Very poor training accuracy.
  - Main problem seems to be inconsistent bounding boxes — varied size, position, and coverage across images.
  - Dataset is too inconsistent for the model to learn clear patterns.

Questions:

Should I train the classification as two outputs in one model (rule type + good/bad), or train two separate classification models?
What’s the best way to integrate object detection into this pipeline so it only runs for Not Good cases?
Any model or workflow recommendations for combining classification and detection efficiently?

Thanks in advance for your insights!

John6666 · August 16, 2025, 1:07am

1

I think it would be better to try two outputs in one model first. Knowledge about other tasks may also provide clues for the model.

Topic		Replies	Views
Struggling to Build an AI That Can Detect “Not Good” vs “Good” Environment Photos Intermediate	1	43	October 8, 2025
Multi-Task Learning to perform two separate classifaction tasks on the same training data Beginners	0	765	May 6, 2021
ObjectDetectionOutput 🤗Transformers	0	139	July 4, 2023
Multi-class classification and Hosted inference API 🤗Hub	0	661	October 27, 2021
Image Classification for photos of documents Beginners	0	157	October 25, 2022

Combining Multi-Output Image Classification with Object Detection for Environment Inspection

Related topics