Apply same transform to pixel_values and labels for semantic segmentation

I am trying to apply three torchvision transforms to my dataset for semantic segmentation.

resize_crop = RandomResizedCrop(size=(1024, 1024), scale=(0.5, 1.5), ratio=(0.5, 2.0))
horiz_flip = RandomHorizontalFlip(p=0.5)
color_jitter = ColorJitter(brightness=0.5, contrast=(0.5, 1.5), saturation=(0.5, 1.5), hue=0.05)

I have been following the workflow in this blog post, but unlike the train_transforms() in the post, I need to apply some of my transforms (resize/crop and flip) to the label mask as well.

How can I do this in a way that guarantees the same random transform is applied to both the image and the corresponding label mask?


That’s a very nice question! So I actually wondered this myself as well. For now I’ve created an example script that illustrates how to do this: transformers/examples/pytorch/semantic-segmentation at add_semantic_script · NielsRogge/transformers · GitHub. It’s based on PyTorch’s official example. As you can see, I’ll define a new class for each transform, which applies it on both the input and the target. This is currently how torchvision also recommends how to do it, they haven’t figured out apparently how to natively support it.

However, currently there’s a bug with the Trainer and haven’t found the time to work on it further. Feel free to try it out!

Alternatively, you can use other frameworks such as Albumentations, which support transforms on inputs + targets right away.