Hello everyone! I am a student currently trying to fine-tune the MaskFormer and Mask2Former segmentation model for an instance segmentation task on a custom dataset. The dataset I am preparing consists of two classes: background
and unhealty
. In each image of the dataset, there are several instances of the unhealty
class. What I am trying to figure out is how to structure the dataset. I am not using a COCO format for the annotations. The tutorials I have found are mainly aimed at semantic segmentation tasks and I have not found much material on how to prepare custom datasets for these models. Thank you in advance for your help!
hi @olmobaldoni
I’m not very familiar with image stuff but this looks promising:
mostly “Create PyTorch DataLoader” part.
Or, are you looking a tool to annotate your images?
I’ve done some research on the notebook in question, but it seems to rely on a pre-existing dataset. I would like to clarify a couple of things:
- Should the segmentation mask display all instances, or for a single image in my dataset, do I have a separate segmentation mask for each instance within that image?
- How should the various masks be encoded? I’ve read elsewhere that an RGB annotation is used, where the red channel represents the class of the instance and the blue channel distinguishes different instances within the same class. However, I’m not entirely sure about this, and I’m curious about how the missing channel is utilized.
batch["class_labels"][batch_index]
tensor([ 1, 1, 2, 5, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 9, 10, 13, 13,
17, 28, 36, 51, 51, 51, 51, 51, 51, 53, 87, 87, 90])
batch["mask_labels"][batch_index].shape
torch.Size([31, 512, 512])
According to notebook you need separate mask for each instance, right? The next masks/images in the notebook refer to two windowpane(class_label = 1) instances from the same image(batch_index = 1).
This one should get the cabinet (class_label = 2) and the next one should be a table etc… :
print("Visualizing mask for:", id2label[batch["class_labels"][batch_index][2].item()])
visual_mask = (batch["mask_labels"][batch_index][2].bool().numpy() * 255).astype(np.uint8)
Image.fromarray(visual_mask)
Please read this part as well:
Next, we provide those to the image processor, which will turn the single instance segmentation map into a set of binary masks and corresponding labels. This is the format that MaskFormer expects (as it casts any image segmentation task to this format - also called "binary mask classification").