Finetuning Segment Anything and automatic prediction

I am trying to fine-tune the Segment Anything (SAM) model following the demo notebook (credits: @nielsr and @ybelkada).
In my case, I have trained on a custom dataset. Now, I want it to predict masks automatically (without any prompts like bounding boxes). How to do it?
I tried a few things (input_boxes=None, input_boxes = [0,0,img_width, img_height], using pipeline("mask-generator", model=my_model, processor = my_processor) from here and loading my checkpoint like in the official automatic_mask_generator_example.ipynb) but they didn’t work.

1 Like


SAM always expects a prompt, which could be a bounding box, mask, text or point (the authors didn’t release the text prompt capability). SAM just generates a mask given an image + a prompt.


The automatic_mask_generator works by generating a lot of point prompts at fixed positions in the image, which enable SAM to generate masks given those prompts. Those are then postprocessed using NMS to remove duplicate detections:

Specifically, we prompt SAM with a 16×16 regular grid of foreground points resulting in 768 predicted masks (3 per point). Redundant masks are removed by NMS.

So in any case, SAM requires prompts. You could for instance first run a zero-shot object detector on your image to get bounding boxes, and then run SAM to get masks given those bounding boxes. That’s what the Grounding DINO-SAM repo does

1 Like

Thanks for the reply! SAM does require prompts, but I was hoping for some pipeline that does the sampling internally.