Hi all,
I’m working on a computer vision project for environment inspection.
The application takes a photo from a fixed, specified angle so dataset consistency is maintained, and the dataset itself also follows this angle requirement.
Objective:
-
First, determine which rule set applies to that photo (e.g., Rule 1, Rule 2, etc.).
-
Then, determine if the environment is Good or Not Good according to that rule.
-
If Not Good, detect and locate the objects that cause the violation.
Workflow I’m considering:
-
Step 1 — Image Classification:
-
Multi-class classification for the rule set (Rule 1, Rule 2, etc.).
-
Binary classification for Good vs Not Good (could be another output head of the same model).
-
-
Step 2 — Object Detection (only if Not Good):
- Detect items that cause the problem, based on the specific rule set identified earlier.
What I’ve tried so far:
This was before I tried to negotiate, because the dataset at that time was very varied. But now I have access to create a new dataset, and I’m still confused about how to combine them.
-
Image Classification (ConvNeXt)
-
Built a binary classification dataset: Good / Not Good.
-
Training loss and accuracy looked good.
-
Result:
-
Works in some cases (e.g., empty table = Good).
-
Fails in others (e.g., full table that’s still acceptable = classified as Not Good).
-
Seems to overfit to simple visual cues like “clutter = bad.”
-
-
-
Object Detection (YOLO)
-
Labeled Not Good examples with bounding boxes showing the issues.
-
Trained YOLO to only detect Not Good objects (no detection = Good).
-
Result:
-
Very poor training accuracy.
-
Main problem seems to be inconsistent bounding boxes — varied size, position, and coverage across images.
-
Dataset is too inconsistent for the model to learn clear patterns.
-
-
Questions:
-
Should I train the classification as two outputs in one model (rule type + good/bad), or train two separate classification models?
-
What’s the best way to integrate object detection into this pipeline so it only runs for Not Good cases?
-
Any model or workflow recommendations for combining classification and detection efficiently?
Thanks in advance for your insights!