Apply same transform to pixel_values and labels for semantic segmentation

deppen8 · March 30, 2022, 5:30pm

I am trying to apply three torchvision transforms to my dataset for semantic segmentation.

resize_crop = RandomResizedCrop(size=(1024, 1024), scale=(0.5, 1.5), ratio=(0.5, 2.0))
horiz_flip = RandomHorizontalFlip(p=0.5)
color_jitter = ColorJitter(brightness=0.5, contrast=(0.5, 1.5), saturation=(0.5, 1.5), hue=0.05)

I have been following the workflow in this blog post, but unlike the train_transforms() in the post, I need to apply some of my transforms (resize/crop and flip) to the label mask as well.

How can I do this in a way that guarantees the same random transform is applied to both the image and the corresponding label mask?

nielsr · March 31, 2022, 8:16am

Hi,

That’s a very nice question! So I actually wondered this myself as well. For now I’ve created an example script that illustrates how to do this: transformers/examples/pytorch/semantic-segmentation at add_semantic_script · NielsRogge/transformers · GitHub. It’s based on PyTorch’s official example. As you can see, I’ll define a new class for each transform, which applies it on both the input and the target. This is currently how torchvision also recommends how to do it, they haven’t figured out apparently how to natively support it.

However, currently there’s a bug with the Trainer and haven’t found the time to work on it further. Feel free to try it out!

Alternatively, you can use other frameworks such as Albumentations, which support transforms on inputs + targets right away.

Topic		Replies	Views
How to use Trainer with Vision Transformer Beginners	3	1690	October 19, 2021
SegformerImageProcessor introducing new labels 🤗Transformers	0	679	April 17, 2023
Data augmentation for image (ViT) using Hugging Face Beginners	9	5984	December 10, 2021
Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 2])) 🤗Transformers	5	5459	October 13, 2023
Only batches of spatial targets supported error 🤗Transformers	2	1886	March 8, 2023

Apply same transform to pixel_values and labels for semantic segmentation

Related topics