Using huggingface models without any other huggingface support?

vmes1 · April 19, 2023, 1:35pm

Hello, I am a beginner to huggingface.

I am working on a project to see how some models/architectures perform with my custom dataset for semantic segmentation. I want to train my models from scratch with no pre trained weights. I am comparing models like ResNet50, Segfromer and Mask2Former. I load all my images and masks using DataLoader. For ResNet50, I use the ResNet50 FCN provided by torchvision: fcn_resnet50 — Torchvision main documentation.

For Segformer, I found that huggingface provides a Segformer model. So I am just using that. The performance isn’t that great so I am wondering if I am doing something wrong. Is it fine if we use huggingface models without using any other huggingface methods like AutoImageProcessor?
I load the model using:

configuration = SegformerConfig(**dict(arch['args'])) 
configuration.num_labels = data['num_classes']
model = SegformerForSemanticSegmentation(configuration)

Then I get the results using:

for _, images, masks in dataloader:
    images = images.to(self.device, non_blocking=True)
    masks = masks.to(self.device, non_blocking=True)
    outputs = model(pixel_values=images, labels=masks).logits
    outputs = nn.functional.interpolate(outputs, size=masks.shape[-2:], mode="bilinear", align_corners=False)

I am also trying to use Mask2Former to train from scratch, but for post processing I need to use Mask2FormerImageProcessor to get the semantic segmentation. I already have processed my images in my DataLoader. What do I do here to just use Mask2Former with my own data?

Thanks.

vmes1 · April 19, 2023, 8:17pm

So for Mask2Former. I am doing this

    configuration = Mask2FormerConfig(**dict(arch['args']))
    configuration.num_queries = data['num_classes']
    model = Mask2FormerForUniversalSegmentation(configuration)

for _, images, masks in dataloader:
    images = images.to(self.device, non_blocking=True)
    masks = masks.to(self.device, non_blocking=True)
    outputs = model(images,
                mask_labels=masks)
        outputs = outputs.masks_queries_logits
        outputs = nn.functional.interpolate(outputs, size=masks.shape[-2:], mode="bilinear", align_corners=False)
align_corners=False)

Is this the correct way to use Mask2Former (model only) for semantic segmentation? Its not performing that well…

vmes1 · April 21, 2023, 4:01pm

Hi, can someone check if I have implemented this correctly? I am not getting the best results. Thanks

Topic		Replies	Views
Mask2Former not performing as expected 🤗Transformers	8	2436	July 22, 2024
The possibility of using non pre-trained SegFormer 🤗Transformers	0	181	April 13, 2023
Train Mask2Former model using Trainer class 🤗Transformers	0	464	November 29, 2023
Using Huggingface for computer vision (Tensorflow)? 🤗Transformers	3	409	June 2, 2025
Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face? Beginners	1	23	June 26, 2025

Using huggingface models without any other huggingface support?

Related topics