Standard Procedure for adapting vision encoders to semantic segmentation

alzaia · September 5, 2023, 4:58am

I notice some of the vision encoders do not have a class ModelForSemanticSegmentation, but only one for image classification, although most of them could in theory be trained to perform semantic segmentation by adding a simple decoder. I do not want to reinvent the wheel and was wondering if there was a standard procedure to create my own ModelXForSemanticSegmentation that I could follow (I checked implementations of the semantic segmentation of the models that have it but can I simply follow the same recipe?). For example I’d like to add a decoder for semantic segmentation to the ConvNeXt encoders series.

Topic		Replies	Views
Attaching a vision decoder to VisionTextDualEncoder Models	0	264	May 10, 2023
How to implement custom vision encoder-decoder? 🤗Transformers	1	685	August 1, 2023
Using ViTMAEModel as an encoder for a UNet decoder for semantic segmentation 🤗Transformers	0	172	June 9, 2024
VisionEncoderDecoder X-Attn Question 🤗Transformers	4	504	June 20, 2022
Swin Transformer for segmentation Beginners	1	2153	November 3, 2022

Standard Procedure for adapting vision encoders to semantic segmentation

Related topics