Combining encoder from one model and a decoder for another for image reconstruction

spencerh · December 15, 2022, 3:01am

Hi! I’m learning the way of the land around Hugging Face code, specifically reading the code behind LayoutLMV3 and MAE.

If my objective is to do a pre-training of masked image reconstruction, much like what ViT-MAE decoder does, but with the encoder from another model like LayoutLMv3, can I modify:

class ViTMAEForPreTraining(ViTMAEPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.config = config
        
        // change the following original code from
        // self.vit = ViTMAEModel(config)
        // to instead 
        self.encoder = Encoder(config) 
        // where Encoder is derived from 
        // class LayoutLMv3Model(LayoutLMv3PreTrainedModel) for example
        self.decoder = ViTMAEDecoder(config, num_patches=self.vit.embeddings.num_patches)

        # Initialize weights and apply final processing
        self.post_init()

My initial hunch is that because LayoutLMv3 returns a BaseModelOutput while the encoder block of MAE returns a VitMAEModelOutput which includes ids_restore, it won’t work. Does this mean we will also have to create a new ModelOutput class?

If it is not as simple as this, then do I need to create a new Embedding class for VitMAE that takes in encoding like LayoutLMV3? Any help would be greatly appreciated, thanks!

Topic		Replies	Views
Inference with VitMAE by providing a mask 🤗Transformers	0	286	January 3, 2024
Separate pre-trained encoder and decoder Models	0	437	October 4, 2023
Call ViTMAE Forward Embedding Models	1	296	March 30, 2023
Calling ViTMAEModel with embeddings and encoder Beginners	2	287	January 31, 2024
Using EncoderDecoderModel 🤗Transformers	4	1067	October 28, 2021

Combining encoder from one model and a decoder for another for image reconstruction

Related topics