Does fine-tuning a language model modify its hidden weights?

R00 · August 9, 2021, 1:00pm

If i load a pre-trained language model (say BERT) and use a standard pytorch implementation (as shown in the code-block), are the weights of the bert-model updated ? and if not is it recommended to try doing so for a down-stream task ?

Note: the task involves using the bert embeddings for clustering text using a clustering algorithm (like KMeans, DBSCAN, etc…)

class Model(nn.Module):
  def __init__(self, name):
    super(Model, self).__init__()
    self.bert = transformers.BertModel.from_pretrained(config['MODEL_ID'], return_dict=False)
    self.bert_drop = nn.Dropout(0.0)
    self.out = nn.Linear(config['HIDDEN_SIZE'], config['NUM_LABELS'])
    self.model_name = name
  
  def forward(self, ids, mask, token_type_ids):
    _, o2 = self.bert(ids, attention_mask = mask, token_type_ids = token_type_ids)
    bo = self.bert_drop(o2)
    output = self.out(bo)
    return output

ehalit · August 10, 2021, 4:55am

When you do optimizer = torch.optimizer.SGD(Model("bert").parameters(), lr=1e-3) the parameters of the pretrained transformer are ready to be updated in addition to other layers you introduce. In my experience, updating only the newly introduced layers during fine-tuning resulted in very slow convergence and I would recommend updating the transformer weights as well.

If you would use a clustering algorithm that assumes embeddings close to each other in the vector space are semantically related, I recommend that you use a loss function to enforce such a beahavior in fine-tuning (maybe Triplet Loss with Siamese Modeling like SBERT) because CLS embedding space, so to speak, is not constructed with such concerns like context-independent word embedding spaces.

Topic		Replies	Views
Custom bert embedding cause "RuntimeError: Trying to backward through the graph a second time" Intermediate	0	927	March 10, 2023
Fine Tune BERT Models Beginners	5	16545	June 25, 2021
Finetuning Bert to adapt to the newly added class 🤗Transformers	0	81	June 22, 2024
Custom Tasks and BERT Fine Tuning Beginners	4	4998	October 30, 2020
Getting better sentence embeddings with BERT - is it just pretraining, or it is pretraining + fine tuning? Beginners	2	3198	March 2, 2021

Does fine-tuning a language model modify its hidden weights?

Related topics