SAM image size for fine-tuning

deppen8 · May 12, 2023, 3:01pm

I am trying to fine-tune the Segment Anything (SAM) model following the recently-posted demo notebook (thanks, @nielsr and @ybelkada !).

I am trying to use 1024x1024 pixel images and masks, but when I try to calculate the loss with loss = seg_loss(predicted_masks, ground_truth_masks.unsqueeze(1)), I get an error from monai:

AssertionError: ground truth has different shape (torch.Size([1, 1, 1024, 1024])) from input (torch.Size([1, 1, 256, 256]))

Obviously, somewhere in the model() call, the input 1024x1024 tensor is getting downsampled to 256x256. In the example notebook, the input image is 256x256 already, so the mask is as well.

I am wondering what is the best way to handle this. Should I simply downsample my masks before calculating the loss or is there some parameter I can change so that I can use 1024x1024 masks?

nielsr · May 13, 2023, 9:17am

Hi,

Thanks for your interest in my notebook! You could use torch.nn.functional.interpolate to interpolate the predicted masks to the appropriate size before calculating the loss:

from torch import nn

predicted_masks = nn.functional.interpolate(predicted_masks,
                size=(1024, 1024),
                mode='bilinear',
                align_corners=False)

This is also used in the postprocessing method of SamImageProcessor.

gobleanMW · June 5, 2023, 12:15pm

Hi @nielsr
I was experimenting with the same notebook. In my case, I have trained on a custom dataset. Now, I want it to predict masks automatically (without any prompts like bounding boxes). How to do it?
I tried a few things (input_boxes=None, input_boxes = [0,0,img_width, img_height], using pipeline("mask-generator", model=my_model, processor = my_processor) from here and loading my checkpoint like in here) but they didn’t work.

marcomameli01 · September 5, 2023, 6:27pm

To use the model finetuned with the pipeline you can save the processor and tuned model with save_pretrained and pass the same folder, after that you can pass to the pipeline the path where you save the with the save_pretrained method.

rwood-97 · November 8, 2023, 12:09pm

github.com/huggingface/transformers

Add how to preprocess mask for finetuning with SAM

opened 11:53AM - 08 Nov 23 UTC

rwood-97

### Feature request The [SAM image processor](https://github.com/huggingface/tr…ansformers/blob/main/src/transformers/models/sam/image_processing_sam.py) takes images as input and resizes them so that the longest edge is 1024 (using default values). This is the size expect as input fo the SAM model. For inference, this works fine as only the images need resizing but for fine-tuning as per [this tutorial](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Fine_tune_SAM_(segment_anything)_on_a_custom_dataset.ipynb), you need to resize both your images and your masks as the SAM model produces `pred_masks` with size 256x256. If I don't resize my masks I get `ground truth has different shape (torch.Size([2, 1, 768, 1024])) from input (torch.Size([2, 1, 256, 256]))` when trying to calculate loss. To fix this, I've currently written a resize and pad function into my code: ``` from PIL import Image def resize_mask(image): longest_edge = 256 # get new size w, h = image.size scale = longest_edge * 1.0 / max(h, w) new_h, new_w = h * scale, w * scale new_h = int(new_h + 0.5) new_w = int(new_w + 0.5) resized_image = image.resize((new_w, new_h), resample=Image.Resampling.BILINEAR) return resized_image def pad_mask(image): pad_height = 256 - image.height pad_width = 256 - image.width padding = ((0, pad_height), (0, pad_width)) padded_image = np.pad(image, padding, mode="constant") return padded_image def process_mask(image): resized_mask = resize_mask(image) padded_mask = pad_mask(resized_mask) return padded_mask ``` and then have added this to my definition of SAMDataset: ``` class SAMDataset(Dataset): def __init__(self, dataset, processor, transform = None): self.dataset = dataset self.processor = processor self.transform = transform def __len__(self): return len(self.dataset) def __getitem__(self, idx): item = self.dataset[idx] if self.transform: image = self.transform(item["pixel_values"]) else: image = item["pixel_values"] # get bounding box prompt padded_mask = process_mask(item["label"]) prompt = get_bounding_box(padded_mask) # prepare image and prompt for the model inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt") # remove batch dimension which the processor adds by default inputs = {k:v.squeeze(0) for k,v in inputs.items()} # add ground truth segmentation inputs["ground_truth_mask"] = padded_mask return inputs ``` This seems to work fine. What I think would be good is to allow input of masks in the SAM image processor. For example, the [Segformer image processor](https://github.com/huggingface/transformers/blob/v4.35.0/src/transformers/models/segformer/image_processing_segformer.py#L305) takes images and masks as inputs and resizes both to the size expected by the Segformer model. I have also seen there is a 'post_process_mask' method in the SAM image processor but I am unsure how to implement this in the tutorial I'm following. If you think this is a better way vs. what I am suggesting then please could you explain where I would add this in the code from the tutorial notebook. ### Motivation Easier fine tuning of SAM model. ### Your contribution I could try write a PR for this and/or make a PR to update the [notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Fine_tune_SAM_(segment_anything)_on_a_custom_dataset.ipynb) instead .

Hi, I’ve just made an issue about this as had the same problem. My inputs + masks are not all the same size so couldn’t work out how to implement the solution you suggested but just resized my masks instead. Just thought would be good to comment here incase others find this thread.

Ziye-Thomas · April 3, 2024, 6:36am

Hi, nielsr! I tried the method you mentioned in my case, to resize the prediction mask (256 x 256) to the same shape as the ground truth (1080 x 1920), but there are still problems. The most intuitive issue is that when I visualize the resized mask, I can see that the position of mask has shifted significantly compared to the ground truth. However, this problem does not occur with masks post-processed using processor.image_processor.post_process_masks. Do you have any suggestions for this issue? Or could you share what specific method is used in post_process_mask?

Topic		Replies	Views
Question on SAM model fine tuning Models	0	377	April 15, 2024
Fine tuning SAM with input images 256x256 Models	5	2363	May 21, 2024
Fine tuning sam Models	2	792	June 3, 2024
Fine-tuning Segment Anything Model: Call up a saved model 🤗Transformers	4	2270	October 28, 2024
How to fine-tune Segment Anything Model (SAM) with multiple points Models	3	3611	February 16, 2024

SAM image size for fine-tuning

Related topics