OneFormer ID/Labels for FineTuning

ShrikB · June 27, 2025, 5:11pm

Hello Forums,

TLDR: Trying to fine-tune OneFormer with a dataset. Doing it soley off of providing ground truth and masks works fine but it classifies the segmented things wrong. Trying to add the id2label did not work as I have gotten an assertion sizes out of bounds issue(Could be implementing it wrong).

I was following a method to fine-tune along with classes and this is how I tried to do it. I have two seperate json files, a id2label and a label2id, both contain the classes and ids present in the new dataset. In my train file, I call this method


with open("labels/id2label.json", "r") as f:
   id2label = json.load(f)
id2label = {int(k): v for k, v in id2label.items()}
label2id = {v: k for k, v in id2label.items()}

along with my config:

config = OneFormerConfig.from_pretrained("model link", id2label=id2label , label2id = label2id, is_training=True)

Is there something I am doing wrong with OneFormer? I was aiming to have something like the maskformer finetune (Source 1).

Sources:

Fine Tuning Mask2Former on Custom Dataset
Fine-Tune a Semantic Segmentation Model with a Custom Dataset
Nile Rogers Finetune for both Oneformer and Mask2Former

John6666 · June 28, 2025, 1:55am

Seems it’s caused by is_training=True with OneFormer…?

github.com/NielsRogge/Transformers-Tutorials

Fine Tuning OneFormer for Semantic Segmentation, how to change model's class number

opened 02:19PM - 30 Jan 24 UTC

closed 09:03AM - 06 Feb 24 UTC

eBeyzaG

I am fine tuning OneFormer for semantic segmentation and I have 5 classes includ…ing background. While training, I am also calculating mIOU so I extract segmentation maps. But as the model config has 150 classes, it outputs class numbers that do not exist in my dataset. I also tried changing num_labels parameter as shown below ``` model=OneFormerForUniversalSegmentation.from_pretrained( "shilabs/oneformer_ade20k_swin_tiny", num_labels=5, ignore_mismatched_sizes=True) ``` But model gives error > [446](https://file+.vscode-resource.vscode-cdn.net/home/beyza/Desktop/TEZ/OneFormer/~/venv-ml/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py:446) # [batch_size, hidden_dim] > [447](https://file+.vscode-resource.vscode-cdn.net/home/beyza/Desktop/TEZ/OneFormer/~/venv-ml/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py:447) image_queries = nn.functional.normalize(image_queries.flatten(1), dim=-1) > --> [448](https://file+.vscode-resource.vscode-cdn.net/home/beyza/Desktop/TEZ/OneFormer/~/venv-ml/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py:448) text_queries = nn.functional.normalize(text_queries.flatten(1), dim=-1) > [450](https://file+.vscode-resource.vscode-cdn.net/home/beyza/Desktop/TEZ/OneFormer/~/venv-ml/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py:450) logit_scale = torch.clamp(self.logit_scale.exp(), max=100) > [452](https://file+.vscode-resource.vscode-cdn.net/home/beyza/Desktop/TEZ/OneFormer/~/venv-ml/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py:452) logits_per_text = torch.matmul(text_queries, image_queries.t()) * logit_scale > > AttributeError: 'NoneType' object has no attribute 'flatten' How can I change class count while fine tuning?

github.com/NielsRogge/Transformers-Tutorials

Fine-tuning Oneformer

opened 05:34PM - 01 Nov 23 UTC

closed 04:56PM - 14 Nov 23 UTC

nickponline

The process for fine-tuning Oneformer seems different to MaskFormer and Mask2For…mer. No matter what I try I can't seem to get the model to work. Here's an example, which I feel should work for semantic segmentation: ``` preprocessor = AutoProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large", num_text=1) model = AutoModelForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large", id2label=config.id2label, ignore_mismatched_sizes=True) print("*** Inputs ***") print(type(images), len(images), type(image[0]), image[0].shape) print(type(segmentation_maps), len(segmentation_maps), type(segmentation_maps[0]), segmentation_map[0].shape) batch = preprocessor( images, ["semantic"] * len(images), segmentation_maps=segmentation_maps, return_tensors="pt", ) print("*** Batch ***") for k, v in batch.items(): print(k, type(v), v.shape if hasattr(v, "shape") else len(v)) outputs = model(**batch) # Crashed here for k, v in outputs.items(): print(k) ``` Which gives crashes with following output: ``` *** Inputs *** <class 'tuple'> 1 <class 'numpy.ndarray'> (256, 256) <class 'tuple'> 1 <class 'numpy.ndarray'> (256,) *** Batch *** pixel_values <class 'torch.Tensor'> torch.Size([1, 3, 800, 800]) pixel_mask <class 'torch.Tensor'> torch.Size([1, 800, 800]) mask_labels <class 'list'> 1 class_labels <class 'list'> 1 text_inputs <class 'torch.Tensor'> torch.Size([1, 1, 77]) task_inputs <class 'torch.Tensor'> torch.Size([1, 77]) /opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/functional.py:505: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3489.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0%| | 0/71 [00:06<?, ?it/s] Traceback (most recent call last): File "/Users/nickp/mr/oneformer/main.py", line 199, in <module> outputs = model(**batch) File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 3221, in forward loss_dict: Dict[str, Tensor] = self.get_loss_dict( File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 3076, in get_loss_dict loss_dict: Dict[str, Tensor] = self.criterion( File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 690, in forward indices = self.matcher(masks_queries_logits, class_queries_logits, mask_labels, class_labels) File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/opt/anaconda3/envs/dev/lib/python3.10/site-packages/transformers/models/oneformer/modeling_oneformer.py", line 306, in forward cost_class = -pred_probs[:, labels] IndexError: index 3 is out of bounds for dimension 0 with size 3 ``` @werner-rammer did you have any success?

ShrikB · June 30, 2025, 7:08pm

I’ve taken a look at the Issues linked but neither have helped solve my issue. I am wondering how first Issue was resolved solely on if is_training=True argument. It is already enabled in my version. Here is a snippet of my attempt:

import json
with open("labels/id2label.json", "r") as f:
    id2label = json.load(f)
id2label = {int(k): v for k, v in id2label.items()}
label2id = {v: k for k, v in id2label.items()}
num_labels = len(id2label)
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_tiny")
model = OneFormerForUniversalSegmentation.from_pretrained(
    "shi-labs/oneformer_ade20k_swin_tiny",
    id2label=id2label,
    label2id=label2id,
    num_labels = num_labels,
    is_training=True,
    ignore_mismatched_sizes=True,
)
model.config.use_contrastive_loss = True
processor.image_processor.num_text = (
    model.config.num_queries - model.config.text_encoder_n_ctx
)

Without the id2label and the inverse arguments, it does fine-tune on the dataset, but with it causes an issue.

UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [16,0,0], thread: [32,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cuda\IndexKernel.cu:92: block: [16,0,0], thread: [33,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

John6666 · July 1, 2025, 12:07am

Hmm… It worked… Could it be that the JSON content is in a label format that PyTorch does not support?

from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
#import json
#with open("labels/id2label.json", "r") as f:
#    id2label = json.load(f)
#id2label = {int(k): v for k, v in id2label.items()}
id2label = {0: "zero", 1: "one"}
label2id = {v: int(k) for k, v in id2label.items()}
num_labels = len(id2label)
processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_ade20k_swin_tiny")
model = OneFormerForUniversalSegmentation.from_pretrained(
    "shi-labs/oneformer_ade20k_swin_tiny",
    id2label=id2label,
    label2id=label2id,
    num_labels = num_labels,
    is_training=True,
    ignore_mismatched_sizes=True,
)
model.config.use_contrastive_loss = True
processor.image_processor.num_text = (
    model.config.num_queries - model.config.text_encoder_n_ctx
)
print(model)
print(processor)

github.com/pytorch/pytorch

[index layer]./aten/src/ATen/native/cuda/IndexKernel.cu:92: Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed

opened 11:44PM - 22 Dec 23 UTC

closed 08:45PM - 26 Dec 23 UTC

peri044

### 🐛 Describe the bug Error: ``` ../aten/src/ATen/native/cuda/IndexKernel….cu:92: operator(): block: [2,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2,0,0], thread: [33, 0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed ``` Steps to reproduce: ```py class DDS(torch.nn.Module): def __init__(self): super().__init__() self.relu = torch.nn.ReLU() def forward(self, x, mask): out = x[mask] out = self.relu(out) return out model = DDS().eval().cuda() x = torch.randn(1, 3, 4, 4).cuda() y = torch.rand((1, 3, 4, 4), device="cuda") < 0.9 y = y.to(torch.int32) inputs=[x, y] ep = torch.export.export(model, tuple(inputs)) gm = ep.module() from torch.fx.passes.shape_prop import ShapeProp ShapeProp(gm).propagate(*inputs) ``` ### Versions [pip3] mypy==1.5.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.25.2 [pip3] onnx==1.14.1 [pip3] onnxruntime==1.16.3 [pip3] pytorch-quantization==2.1.2 [pip3] pytorch-sphinx-theme==0.0.24 [pip3] pytorch-triton==2.1.0+6e4932cda8 [pip3] torch==2.2.0.dev20231127+cu121 [pip3] torch-tensorrt==2.2.0.dev0+7ad927248 [pip3] torchprofile==0.0.4 [pip3] torchvision==0.17.0.dev20231127+cu121 [pip3] triton==2.1.0

ShrikB · July 1, 2025, 2:23am

I have made the id2label.json file as something like this.

{
  "Background": 0,
  "Road": 1
}

The format which matches the one provided by the mask2former segmentation tutorial: https://huggingface.co/datasets/segments/sidewalk-semantic/blob/main/id2label.json

What did you mean by it worked on your end? Did my snippet run on your machine?

John6666 · July 1, 2025, 2:43am

What did you mean by it worked on your end? Did my snippet run on your machine?

Yes. Yes.

ShrikB · July 1, 2025, 4:04am

Oh boy. Can you give me a general idea on what your system is? I’m running a rtx4090 so I’ve never thought of it being an issue. Unless it is a dependency issue I’m unaware of.

John6666 · July 1, 2025, 4:57am

Yeah. My env is Windows (raw), Python 3.9 (raw), GeForce RTX 3060Ti 8GB.

accelerate                1.8.1
bitsandbytes              0.45.1
hf-xet                    1.1.5
huggingface-hub           0.33.0
numpy                     1.23.5
peft                      0.14.0
pydantic                  2.10.6
torch                     2.4.0+cu124
torchaudio                2.4.0+cu124
torchvision               0.19.0+cu124
transformers              4.46.3

Topic		Replies	Views
Custom dataset for MaskFormer and Mask2Former Beginners	3	416	October 17, 2024
Custom dataset for Mask2Former finetuning 🤗Datasets	2	2094	November 23, 2023
Fine-tuning a Fine-tuned Model Models	1	444	July 5, 2023
Fine tune Transformers for text generation 🤗Transformers	11	11972	July 27, 2023
How Labelled Data is Processed \| Transformers Trainer 🤗Transformers	10	4138	April 16, 2024

OneFormer ID/Labels for FineTuning

Related topics