If i load a pre-trained language model (say BERT) and use a standard pytorch implementation (as shown in the code-block), are the weights of the bert-model updated ? and if not is it recommended to try doing so for a down-stream task ?
Note: the task involves using the bert embeddings for clustering text using a clustering algorithm (like KMeans, DBSCAN, etc…)
class Model(nn.Module):
def __init__(self, name):
super(Model, self).__init__()
self.bert = transformers.BertModel.from_pretrained(config['MODEL_ID'], return_dict=False)
self.bert_drop = nn.Dropout(0.0)
self.out = nn.Linear(config['HIDDEN_SIZE'], config['NUM_LABELS'])
self.model_name = name
def forward(self, ids, mask, token_type_ids):
_, o2 = self.bert(ids, attention_mask = mask, token_type_ids = token_type_ids)
bo = self.bert_drop(o2)
output = self.out(bo)
return output