Hi everyone,
I’m fine-tuning BERT to perform a NER task.
I’m wondering, if I fine-tune the same BERT model used for NER, to perform a POS tagging task, could the performance of NER task be improved?
To clarify the question:
class EntityModel(nn.Module):
def __init__(self, num_tag, num_pos):
super(EntityModel, self).__init__()
self.num_tag = num_tag
self.num_pos = num_pos
self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL_PATH)
self.bert_drop_1 = nn.Dropout(0.3)
self.bert_drop_2 = nn.Dropout(0.3)
self.out_tag = nn.Linear(768, self.num_tag)
self.out_pos = nn.Linear(768, self.num_pos)
def forward(self, ids, mask, token_type_ids, target_pos, target_tag):
o1, _ = self.bert(ids, attention_mask=mask, token_type_ids=token_type_ids)
bo_tag = self.bert_drop_1(o1)
bo_pos = self.bert_drop_2(o1)
tag = self.out_tag(bo_tag)
pos = self.out_pos(bo_pos)
loss_tag = loss_fn(tag, target_tag, mask, self.num_tag)
loss_pos = loss_fn(pos, target_pos, mask, self.num_pos)
loss = (loss_tag + loss_pos) / 2
return tag, pos, loss
This is my model. It takes the target_pos, target_entities and the train_data (the sentences) ehich are the same for both the tasks.
For example:
Sentence #,Word,POS,Tag
Sentence: 1:
Thousands,NNS,O
,of,IN,O
,demonstrators,NNS,O
,have,VBP,O
,marched,VBN,O
,through,IN,O
,London,NNP,B-geo
,to,TO,O
,protest,VB,O
,the,DT,O
,war,NN,O
,in,IN,O
,Iraq,NNP,B-geo
,and,CC,O
,demand,VB,O
,the,DT,O
,withdrawal,NN,O
,of,IN,O
,British,JJ,B-gpe
,troops,NNS,O
,from,IN,O
,that,DT,O
,country,NN,O
Two FeedForward layers are used for the POS and NER prediction respectively, but the embeddings come from the same BERT model.
Then, the total loss is calculated as the average loss computed individually from the two tasks.
So, my question is:
Could the NER accuracy/Precision/Recall etc… be improved performing the POS task too?
Many thanks in advance