OneFormer ID/Labels for FineTuning

Hello Forums,

TLDR: Trying to fine-tune OneFormer with a dataset. Doing it soley off of providing ground truth and masks works fine but it classifies the segmented things wrong. Trying to add the id2label did not work as I have gotten an assertion sizes out of bounds issue(Could be implementing it wrong).

I was following a method to fine-tune along with classes and this is how I tried to do it. I have two seperate json files, a id2label and a label2id, both contain the classes and ids present in the new dataset. In my train file, I call this method


with open("labels/id2label.json", "r") as f:
   id2label = json.load(f)
id2label = {int(k): v for k, v in id2label.items()}
label2id = {v: k for k, v in id2label.items()}

along with my config:

config = OneFormerConfig.from_pretrained("model link", id2label=id2label , label2id = label2id, is_training=True)

Is there something I am doing wrong with OneFormer? I was aiming to have something like the maskformer finetune (Source 1).

Sources:

  1. Fine Tuning Mask2Former on Custom Dataset
  2. Fine-Tune a Semantic Segmentation Model with a Custom Dataset
  3. Nile Rogers Finetune for both Oneformer and Mask2Former
1 Like

Seems it’s caused by is_training=True with OneFormer…?

I’ve taken a look at the Issues linked but neither have helped solve my issue. I am wondering how first Issue was resolved solely on if is_training=True argument. It is already enabled in my version. Here is a snippet of my attempt:

import json
with open("labels/id2label.json", "r") as f:
    id2label = json.load(f)
id2label = {int(k): v for k, v in id2label.items()}
label2id = {v: k for k, v in id2label.items()}
num_labels = len(id2label)
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_tiny")
model = OneFormerForUniversalSegmentation.from_pretrained(
    "shi-labs/oneformer_ade20k_swin_tiny",
    id2label=id2label,
    label2id=label2id,
    num_labels = num_labels,
    is_training=True,
    ignore_mismatched_sizes=True,
)
model.config.use_contrastive_loss = True
processor.image_processor.num_text = (
    model.config.num_queries - model.config.text_encoder_n_ctx
)

Without the id2label and the inverse arguments, it does fine-tune on the dataset, but with it causes an issue.

UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [16,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [16,0,0], thread: [33,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

1 Like

Hmm… It worked… Could it be that the JSON content is in a label format that PyTorch does not support?

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
#import json
#with open("labels/id2label.json", "r") as f:
#    id2label = json.load(f)
#id2label = {int(k): v for k, v in id2label.items()}
id2label = {0: "zero", 1: "one"}
label2id = {v: int(k) for k, v in id2label.items()}
num_labels = len(id2label)
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_tiny")
model = OneFormerForUniversalSegmentation.from_pretrained(
    "shi-labs/oneformer_ade20k_swin_tiny",
    id2label=id2label,
    label2id=label2id,
    num_labels = num_labels,
    is_training=True,
    ignore_mismatched_sizes=True,
)
model.config.use_contrastive_loss = True
processor.image_processor.num_text = (
    model.config.num_queries - model.config.text_encoder_n_ctx
)
print(model)
print(processor)

I have made the id2label.json file as something like this.

{
  "Background": 0,
  "Road": 1
}

The format which matches the one provided by the mask2former segmentation tutorial: https://huggingface.co/datasets/segments/sidewalk-semantic/blob/main/id2label.json

What did you mean by it worked on your end? Did my snippet run on your machine?

1 Like

What did you mean by it worked on your end? Did my snippet run on your machine?

Yes. Yes.

Oh boy. Can you give me a general idea on what your system is? I’m running a rtx4090 so I’ve never thought of it being an issue. Unless it is a dependency issue I’m unaware of.

1 Like

Yeah. My env is Windows (raw), Python 3.9 (raw), GeForce RTX 3060Ti 8GB.

accelerate                1.8.1
bitsandbytes              0.45.1
hf-xet                    1.1.5
huggingface-hub           0.33.0
numpy                     1.23.5
peft                      0.14.0
pydantic                  2.10.6
torch                     2.4.0+cu124
torchaudio                2.4.0+cu124
torchvision               0.19.0+cu124
transformers              4.46.3