Distributed fine-tuning with frozen embedding layers

adrienchaton · August 16, 2022, 7:14am

Hi HF community,

I want to fine-tune a HF checkpoint while freezing the transformer’s embedding layers.
By doing so (description below), I am getting the following when using DDP and torchrun:
AssertionError: expects all parameters to have same requires_grad

The same script while not freezing any weights works fine.
The same script while freezing weights but only using one GPU (no DDP) works too.

This is how I instantiate the model before passing it to the trainer:

model = AlbertForMaskedLM.from_pretrained("Rostlab/prot_albert")
if freeze_embeddings:
   for param in model.albert.embeddings.parameters():
      param.requires_grad = False

Main hyper-parameters for the fine-tuning CLI:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --standalone --nnodes=1 --nproc_per_node=8 finetune_protbert.py --fp16 --fp16_opt_level O1 --sharded_ddp zero_dp_2

Is this issue known ? Any hints for solving it please ?

Thanks !

Topic		Replies	Views
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	3987	August 26, 2020
How to freez a model? 🤗Transformers	1	830	December 14, 2021
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3658	January 6, 2021
Freezing mt5 model for fine-tuning Models	1	490	July 15, 2023
Gradual Layer Freezing with huggingface model 🤗Transformers	1	898	February 10, 2021

Distributed fine-tuning with frozen embedding layers

Related topics