Custom dataset maskformer

Mohamedfarag96 · January 16, 2025, 12:56pm

Hello everyone,
wish you to be in good health!

I am facing an issue for training the maskformer on custom dataset.
my data has 6 classes, it is agricultural dataset. For pepper

I process the data as follows, firstly I created the masks using these lines

def __getitem__(self, idx):
        # Get image info
        image_info = self.images[idx]
        image_id = image_info['id']
        width, height = image_info['width'], image_info['height']

        # Load the image
        relative_path = image_info['path'].lstrip('/datasets/')  # Remove '/datasets/' prefix
        image_path = os.path.join(self.root_dir, relative_path)
        image = Image.open(image_path).convert("RGB")
        if self.transform:
            image = self.transform(image)

        # Get annotations for the image
        annotations = self.image_id_to_annotations.get(image_id, [])
        segmentations = [anno.get('segmentation', []) for anno in annotations]
        category_ids = [anno['category_id'] for anno in annotations]

        # Create the semantic map
        semantic_map = np.zeros((height, width), dtype=np.uint8)
        for seg, category_id in zip(segmentations, category_ids):
            if category_id == 11:  # Skip category 11
                continue
            for polygon in seg:
                polygon = np.array(polygon).reshape(-1, 2)
                rr, cc = skimage.draw.polygon(polygon[:, 1], polygon[:, 0], semantic_map.shape)
                semantic_map[rr, cc] = category_id

        semantic_map_tensor = torch.tensor(semantic_map, dtype=torch.uint8)

        return {
            'image': image,
            'semantic_map': semantic_map_tensor,
            'image_id': image_id,
            'width': width,
            'height': height,
        }

then to do the id2label I used this


"Needs to be checked later, date 13/01/2025, line: 193 new, and line 194 old"

# id2label = {10:"background", 11: "pepper_kp", 12: "pepper red", 13: "pepper yellow", 14: "pepper green", 15: "pepper mixed", 17: "pepper mixed_red", 18: "pepper mixed_yellow"}

id2label = {11:'background', 12: "pepper red", 13: "pepper yellow", 14: "pepper green", 15: "pepper mixed", 17: "pepper mixed_red", 18: "pepper mixed_yellow"}

# Remap to contiguous IDs

label2id = {old_id: new_id for new_id, old_id in enumerate(sorted(id2label.keys()))}

id2label_remapped = {label2id[old_id]: label for old_id, label in id2label.items()}


id2color = {
    
    0: '#000000',
    1: '#c7211c',
    2: '#fff700',
    3: '#00ff00',
    4: '#e100ff',
    5: '#ff6600',
    6: '#d1c415',
}

palette = [
    tuple(int(id2color[id].lstrip('#')[i:i+2], 16) for i in (0, 2, 4)) if id in id2color else (0, 0, 0)
    for id in range(len(id2label_remapped))
]
palette = np.array(palette, dtype=np.uint8)
print(palette)

from transformers import MaskFormerImageProcessor

# Create a preprocessor

preprocessor = MaskFormerImageProcessor(ignore_index = 0, reduce_labels=False, do_resize=False, do_rescale=False, do_normalize=False)

is this the correct way to preprocess the inputs before fine tuning?

Mohamedfarag96 · January 16, 2025, 12:57pm

model = MaskFormerForInstanceSegmentation.from_pretrained("facebook/maskformer-swin-base-ade",
                                                          id2label=id2label_remapped,
                                                          ignore_mismatched_sizes=True)

olmobaldoni · January 16, 2025, 3:58pm

Hi, I too am trying to finetune maskformer for a very similar task and am having problems with the parameter setting. In my case {0: ‘background’, 1: ‘unhealty’} and I am using:

    self.processor = AutoImageProcessor.from_pretrained(
        ‘facebook/maskformer-swin-small-coco’,
        do_reduce_labels=True,
        ignore_index=255,
        do_resize=False,
        do_resize=False,
        do_normalize=False,
    )

because I do not want to predict the background (if I understand correctly). But this way the results are really very poor and the model does not seem to learn anything.

The only way I got good results was to consider the background, but in that case the model gives too much weight to the background class and does not predict everything else correctly.

The strange thing is that using mask2former on the same dataset and with the same id2label I get far better results

Mohamedfarag96 · January 16, 2025, 5:01pm

Hey olmo!
Thanks a lot!
I am trying also to do everything to get good results. furthermore, I want to add class weights for each class, I have severe imbalance unfortunately. But it is not that easy.

also I have weird colored background, I don’t know why. I am trying to ignore the background but not helping so much unfortunately

Mohamedfarag96 · January 16, 2025, 5:13pm

but why do use reduce labels? I am doing the same but with just ignoring index=0

olmobaldoni · January 16, 2025, 7:43pm

I decided to use do_reduce_labels=True together with ignore_index=255, following this discussion about a similar case:

Additionally, I found a similar scenario in a tutorial for fine-tuning Mask2Former, where the config.json file also has do_reduce_labels=True. According to the documentation, the preprocessing for MaskFormer and Mask2Former should be identical.

github.com/huggingface/transformers

examples/pytorch/instance-segmentation/run_instance_segmentation.py

94af1c0aa


      
          # Load pretrained config, model and image processor
          # ------------------------------------------------------------------------------------------------
          model = AutoModelForUniversalSegmentation.from_pretrained(
              args.model_name_or_path,
              label2id=label2id,
              id2label=id2label,
              ignore_mismatched_sizes=True,
              token=args.token,
          )
          
          image_processor = AutoImageProcessor.from_pretrained(
              args.model_name_or_path,
              do_resize=True,
              size={"height": args.image_height, "width": args.image_width},
              do_reduce_labels=args.do_reduce_labels,
              reduce_labels=args.do_reduce_labels,  # TODO: remove when mask2former support `do_reduce_labels`
              token=args.token,
          )
          
          # ------------------------------------------------------------------------------------------------
          # Define image augmentations and dataset transforms

In the same file:

# We need to specify the label2id mapping for the model
# it is a mapping from semantic class name to class index.
# In case your dataset does not provide it, you can create it manually:
# label2id = {"background": 0, "cat": 1, "dog": 2}
label2id = dataset["train"][0]["semantic_class_to_id"]

if args.do_reduce_labels:
    label2id = {name: idx for name, idx in label2id.items() if idx != 0}  # remove background class
    label2id = {name: idx - 1 for name, idx in label2id.items()}  # shift class indices by -1

From what I gather (though I’m not entirely sure, as the documentation isn’t very clear), when you don’t want to consider the background as a segmentable class, the preprocessor replaces the background in the image with the value 255. This value is ignored during loss computation.

Thus, I set the parameter as follows. I don’t think the ignore_index value can be arbitrarily set (e.g., if I have {0: 'garden', 1: 'car', 2: 'tree'} and set ignore_index=1, the ‘car’ class will be ignored during loss computation).

The parameter do_reduce_labels=True ensures that classes start from 0 and increment upward, which is why they are shifted by -1.

Example (Models trained with 20 epochs and learning rate 5e-5)

Test Image:

Preprocessor for MaskFormer:

self.processor = AutoImageProcessor.from_pretrained(
    "facebook/maskformer-swin-small-coco",
    do_reduce_labels=True,
    reduce_labels=True,
    ignore_index=255,
    do_resize=False,
    do_rescale=False,
    do_normalize=False,
)

Results with MaskFormer:

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_loss           1.0081120729446411
        test_map           0.038004860281944275
       test_map_50          0.06367719173431396
       test_map_75         0.040859635919332504
     test_map_large         0.5004204511642456
     test_map_medium        0.04175732284784317
     test_map_small        0.007470746990293264
       test_mar_1           0.01011560671031475
       test_mar_10          0.05838150158524513
      test_mar_100          0.06329479813575745
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Test Image Result with MaskFormer:

Preprocessor for Mask2Former:

self.id2label = {0: "unhealty"}
self.label2id = {v: int(k) for k, v in self.id2label.items()}
self.processor = AutoImageProcessor.from_pretrained(
    "facebook/mask2former-swin-small-coco-instance",
    do_reduce_labels=True,
    reduce_labels=True,
    ignore_index=255,
    do_resize=False,
    do_rescale=False,
    do_normalize=False,
)

Results with Mask2Former:

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
        test_loss           15.374979972839355
        test_map            0.44928184151649475
       test_map_50          0.6224347949028015
       test_map_75          0.5011898279190063
     test_map_large         0.8390558958053589
     test_map_medium        0.6270320415496826
     test_map_small         0.32075226306915283
       test_mar_1           0.03526011481881142
       test_mar_10          0.24104046821594238
      test_mar_100          0.5274566411972046
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Test Image Result with Mask2Former:

As you can see, the results are very different, even though the code is identical, except for the parts where the model type is changed. if you want I can share the code.

Mohamedfarag96 · January 17, 2025, 10:25am

Thaaaaaanks for the details!
I will check and try if it works for me!

you are right, the documentation isn’t clear and the tutorials is not that generic to be adapted to any case.

for my case the background is with index or label 0, hence I am ignoring index 0 to drop it from calculations during fine tuning.

but without reducing labels.

Two main issues for me now, the plotting but maybe I need to figure out a solution for that.
and the fine tuning procedure, which backbone should I use and which hyper-parameters values. I have tried swin base for ADE data and also ResNet-50, but no good solution till now.

also I have tried coco weights for swin-base, but not that good

this is the final result

Mohamedfarag96 · January 17, 2025, 10:43am

my problem now is that the model is predicting wrong class for the background, I don’t know why

Mohamedfarag96 · January 17, 2025, 11:11am

Update I have tried Mask2Former, and It is so much better by far.

Mohamedfarag96 · January 17, 2025, 11:13am

Mask2Former outputs!

Mohamedfarag96 · January 17, 2025, 12:00pm

also I have problem, I have a dominant class, and I need to add class weight to the loss to balance the classes, do you know by any means how to do it?

Mohamedfarag96 · January 17, 2025, 3:23pm

Dear Olmo! Thaaaaaaanks a lot for your input it solved my problem, by setting the ignore index to 255.

I am speechless! wish you a nice weekend! and good luck with your work, you do impressive things from what I see from the images!

Here are some images! wish you a nice day!

olmobaldoni · January 17, 2025, 5:12pm

Hi Mohamed! I’m very happy that you managed to solve the problem! May I just ask you if you were able to resolve the issue with MaskFormer or with Mask2Former? I would also like to ask if you could send the processor you used, along with the id2label mapping. I’m curious to know how you handle the background class (from the image, it seems that you still have the background class).

I wish you a good weekend as well!

Mohamedfarag96 · January 17, 2025, 5:17pm

I switched to Mask2Former, but next week I will check if it works for MaskFormer. They need to document in a better way

Sure, Next week, Monday I will send you everything!
Even we can have a quick meeting if you want through zoom to talk about it.
Thanks a lot for your input!

olmobaldoni · January 17, 2025, 5:27pm

Sorry to bother you, but I was curious to understand why I have this drop in performance when I use MaskFormer instead of Mask2Former. Please send me everything only if it doesn’t bother you and doesn’t take up your time. If I eventually can’t understand why it’s not working, we can also talk on Zoom!
Thanks again for your kindness!

system · January 18, 2025, 5:28am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom dataset for MaskFormer and Mask2Former Beginners	3	426	October 17, 2024
Custom dataset for Mask2Former finetuning 🤗Datasets	2	2104	November 23, 2023
Dataset for Mask2former 🤗Datasets	1	162	October 9, 2024
ValueError - number of spatial dimensions Intermediate	0	313	January 19, 2023
Custom Data Collator Gives Error 🤗Transformers	1	1277	February 27, 2023

Custom dataset maskformer

Related topics