I have a use case of Mask2Former where I must superpose image annotations.
One example is I have to detect “cracks” and “reparation”… As well as “cracks” within “reparations”. So I will have “cracks” instances superposed to “reparation” instances.
This is part of the Instance Segmentation paradigm and models Mask2Former support it natively.
The documentation suggests storing the mask in a single image :
- The Red channel is used to store the Label ID
- The Green channel is used to store the Instance ID
Ref : https://github.com/huggingface/transformers/tree/main/examples/pytorch/instance-segmentation
This does not allow the superposition of instances, as one pixel can only have one class and one instance.
Then this is decoded by AutoImageProcessor into mask_labels and class_labels. Where mask_labels is (if no mistake) an NxWxH binary tensor with N being the number of instances and class_labels is an N, integers, tensor, storing the Label ID of each instance.
I can see the workaround. In my Hugging Face dataset, I shall create:
- A column for the image
- A column for the list of instances masks (i.e. a NxWxH tensor)
- A column for the list of Labels ID of each Instance.
Like many other users, I can only export instance segmentation annotation from my annotation software in standard formats such as COCO.
This means that my process is now…
- Export my annotations in COCO or a similar format
- Remix the Coco’s .json structure in HuggingFace .jsonl structure
- Store that in the Hugging Face Dataset
- Load the dataset in the training script
- Decode the masks’ RLE into NxWxH and N tensors
- Apply my transformations on the tensors
- Feed the “Training” class
I’m a bit surprised because the initial implementation of Mask2Former in Detectron2 natively supports loading COCO datasets.
Has this been deleted in the Hugging Face implementation or am I missing something?