Combining Multi-Output Image Classification with Object Detection for Environment Inspection

Hi all,

I’m working on a computer vision project for environment inspection.
The application takes a photo from a fixed, specified angle so dataset consistency is maintained, and the dataset itself also follows this angle requirement.

Objective:

  • First, determine which rule set applies to that photo (e.g., Rule 1, Rule 2, etc.).

  • Then, determine if the environment is Good or Not Good according to that rule.

  • If Not Good, detect and locate the objects that cause the violation.

Workflow I’m considering:

  1. Step 1 — Image Classification:

    • Multi-class classification for the rule set (Rule 1, Rule 2, etc.).

    • Binary classification for Good vs Not Good (could be another output head of the same model).

  2. Step 2 — Object Detection (only if Not Good):

    • Detect items that cause the problem, based on the specific rule set identified earlier.

What I’ve tried so far:

This was before I tried to negotiate, because the dataset at that time was very varied. But now I have access to create a new dataset, and I’m still confused about how to combine them.

  1. Image Classification (ConvNeXt)

    • Built a binary classification dataset: Good / Not Good.

    • Training loss and accuracy looked good.

    • Result:

      • Works in some cases (e.g., empty table = Good).

      • Fails in others (e.g., full table that’s still acceptable = classified as Not Good).

      • Seems to overfit to simple visual cues like “clutter = bad.”

  2. Object Detection (YOLO)

    • Labeled Not Good examples with bounding boxes showing the issues.

    • Trained YOLO to only detect Not Good objects (no detection = Good).

    • Result:

      • Very poor training accuracy.

      • Main problem seems to be inconsistent bounding boxes — varied size, position, and coverage across images.

      • Dataset is too inconsistent for the model to learn clear patterns.

Questions:

  1. Should I train the classification as two outputs in one model (rule type + good/bad), or train two separate classification models?

  2. What’s the best way to integrate object detection into this pipeline so it only runs for Not Good cases?

  3. Any model or workflow recommendations for combining classification and detection efficiently?

Thanks in advance for your insights!

1 Like

1

I think it would be better to try two outputs in one model first. Knowledge about other tasks may also provide clues for the model.