Hello, I have a small portion of label data, and a much bigger set of unlabeled observations. I want to use the unlabeled samples in order to continue the pre-training of BERT, and then built a classifier on top of it.
Following this post
I tried to use BertModel.from_pretrained(‘bert-base-uncased’), and specifically
model = BertModel.from_pretrained(HF_BERT_MODEL)
model.cuda()
optimizer = AdamW(model.parameters(),
lr = 2e-5,
eps = 1e-8
)
model.train()
# For each batch of training data...
for step, batch in enumerate(train_dataloader):
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
model.zero_grad()
result = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
return_dict=True)
loss = result.loss
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
scheduler.step()
>>>'BaseModelOutputWithPoolingAndCrossAttentions' object has no attribute 'loss'
I get the above error.
My question is, how to do the fine-tuning?
I don’t understand the question. Don’t you need to provide labels in order to calculate a loss? You either provide labels to the model so it can calculate the loss or you calculate loss yourself using the output of the model.
So in your code above, for training I would expect to pass in the labels to model, the. get the loss as you are expecting and then call loss.backward
The idea was to continue the pre-training, which, according to BERT is masking words (around 15% of the provided texts) and trying to predict the masked tokens. To the best of my knowledge, this process is considered to be “self-supervised” and therefore you don’t implicitly provide labels, but instead they are inferred from the data. In this case, you still have a loss (otherwise how could you learn the embeddings).
From your question though, I understand that I might need to mask tokens myself and add the masked token as the label. Am I correct? Care you to refer me to a notebook that shows an example?