You can access the name of the parameters through model.state_dict().keys() and customize the optimizer according to the name of the parameters in the corresponding layers, for example if you set optimizer_grouped_parameters = [{'params': [p for n, p in model.named_parameters() if "pooler" in n], 'weight_decay': 0.01}]
and then initialize the optimizer with optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5), only the pooler layer will get updated during training.
If you check the weights of the pooler layer after learning with model.pooler.weight, it will be different from the initial model. However, other layers will have the same weights with the initial model.
Thanks for your reply! I think my question is how to index the last encoder layer of the AlBert model. In BertModel, the last encoder layer can be indexed by model.encoder.layer[-1]. But in this ALBERT model, it only has a modulelist, so I don’t know how to index the last encoder layer.
I am not entirely sure, but since ALBERT applies parameter sharing across layers (see the documentation for AlbertConfig num_hidden_groups (int, optional, defaults to 1) – Number of groups for the hidden layers, parameters in the same group are shared.), selectively updating a layer might not be possible.