Customising pretrained SegFormer

My project is focused on imagery which doesn’t have 3 channels. I would like to be able to load a pretrained TFSegformerModel, and then change the first convolutional layer within it, so that it accepts a different number of channels.

Obviously, this means that the pretrained weights for that layer will be incompatible with the newly sized convolution, but I would like to then finetune this model with randomly initialized weights in that first customized layer, leaving the rest of the pretrained model intact.

Currently, I can create a randomly initialised model by defining a SegformerConfig with a customised ‘num_channels’. However, I cannot then find a way to then load the pretrained model’s weights to the other layers, and to then only set the first, randomly initialised layer as trainable.

Any help or advice would be appreciated, thanks. Code below is a useful starting point for the discussion.

from transformers import SegformerConfig,TFSegformerModel
 
custom_config = SegformerConfig(num_channels=6)
custom_model = TFSegformerModel(custom_config)
pretrained_model = TFSegformerModel.from_pretrained("nvidia/mit-b0")

# Now I need a way to get the weights of pretrained_model into custom_model, 
# except the first convolutional layer, which has a different geometry.

I’ll ping our TF experts here, cc @joaogante @amyeroberts.

Maybe also @sayakpaul can give some advice here

1 Like

Thanks for the ping @nielsr!

Hey @aliFrancis :wave: The suggested approach for these sort of cases in TensorFlow, where you want the original pre-trained model except for a few layers, is to load the pre-trained model and then, on the pre-trained model object, replace the layers of interest. You will need to follow the original object structure to access the layers you want to replace (see transformers/modeling_tf_segformer.py at c186e816bdd236ae4e5b64c6e50e8976a910abf5 ¡ huggingface/transformers ¡ GitHub)

Here is some pseudo-code for it that shouldn’t be far off from what you need:

pretrained_model = TFSegformerModel.from_pretrained("nvidia/mit-b0")
# you might want to set the entire model as non-trainable, so everything 
# except your new layer stays frozen

# set the right initialization here; depending on your use case, you might 
# need to copy-paste and redefine a few parts of the class
my_layer_with_six_channels = TFSegformerLayer(...) 
pretrained_model.segformer.encoder.block[0][0] = my_layer_with_six_channels
1 Like

Thanks for the tips @joaogante, they sent me on the right track.

Rather than using the TFSegformerLayer, it was actually easier to just create two models, one pretrained with the existing config, and then another randomly initialized one with a different config. You can then just substitute out the layer(s) you want to customise in the original.

Also, I think the first layer of the model is actually,
…encoder.embeddings[0]

rather than,
…encoder.block[0][0]

Code snippet here for anyone interested.

from transformers import SegformerConfig, TFSegformerForSemanticSegmentation

NUM_CHANNELS = 6


# Get pretrained model
segformer_model = TFSegformerForSemanticSegmentation.from_pretrained("nvidia/mit-b1")

# Copy the configuration of pretrained model
new_config = segformer_model.config

# Modify config's values
new_config.num_channels=NUM_CHANNEL

# Instantiate new (randomly initialized) model
new_model = TFSegformerForSemanticSegmentation(new_config)

#Substitute first layer of the pretrained model with the modified one 
segformer_model.segformer.encoder.embeddings[0] = new_model.segformer.encoder.embeddings[0]
2 Likes

You can also configure the number of input channels directly:

import transformers

model = transformers.AutoModelForSemanticSegmentation.from_pretrained(
  "nvidia/mit-b0",
  num_channels=4,
  num_labels=8,
  ignore_mismatched_sizes=True,
)