How do you deal with missing or incomplete datasets in computer vision?

Hey everyone!
I’m curious how people here handle dataset shortages for object detection / segmentation projects (YOLO, Mask R-CNN, etc.).

A few quick questions:

  1. How often do you run into a lack of good labeled data for your models?

  2. What do you usually do when there’s no dataset that fits — collect real data, label manually, or use synthetic/simulated data?

  3. Have you ever tried generating synthetic data (Unity, Unreal, etc.) — did it actually help?

Would love to hear how different teams or researchers deal with this.

1 Like

Hugging Face Discord has several channels dedicated to datasets, and if your field is science, there’s also the Hugging Science Discord, so asking there might be more reliable.

It’s rare for datasets to be sufficiently complete from the start, so synthetic datasets are usually a valid approach.