Partially fine-tuning an encoder in an encoder-decoder transformer

Fawaz · August 17, 2021, 8:21am

Hi. I’m doing an application on encoder-decoder network. I’ve frozen the encoder the fine-tuned a GPT-2 decoder. This can be simply achieved with GPT2Config.add_cross_attention = True and by passing the encoder_hidden_states to the forward function of GPT2. When preparing the model using accelerate, I only send the decoder, and I externally extract the encoder features. My code looks like this during training:

config = GPT2Config()
config.add_cross_attention = True
model = GPT2LMHeadModel.from_pretrained('gpt2', config = config)
model = model.to(device)
model, optimizer, train_loader = accelerator.prepare(model, optimizer, train_loader)

for epoch in range(epochs):
    
    model.train()
    for i, batch in enumerate(train_loader):

        batch = tuple(input_tensor.to(device) for input_tensor in batch)
        encoder_input, input_ids, segment_ids = batch

        # my_encoder is not sent to prepare function, and frozen and ran with @torch.no_grad
        encoder_outputs = my_encoder(encoder_input)
        
        outputs = model(input_ids=input_ids, 
                        token_type_ids=segment_ids, 
                        encoder_hidden_states=encoder_outputs, 
                        **kwargs)
        
        loss = outputs.loss
        loss.backward()

If I want to fine-tune partially the my_encoder (let’s say last block only), I must send the whole my_encoder to the prepare function? Or should I break it up into two parts, the first which is not trainable (not send to prepare ) and the second which will be trained (which will be sent to prepare )? Ofcourse any option requires modifying the GPT2LMHeadModel to add the trained part of the encoder, or to add the trainable part to the optimizer parameters.

Topic		Replies	Views
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	3945	August 26, 2020
Separate pre-trained encoder and decoder Models	0	437	October 4, 2023
How to freeze layers while fine-tuning? 🤗Transformers	2	226	May 16, 2025
How to use the encoder only from T5? Beginners	0	673	April 9, 2022
GPT-GPT encoder decoder 🤗Transformers	0	287	May 4, 2021

Partially fine-tuning an encoder in an encoder-decoder transformer

Related topics