hi,
I fine-tune the bert on NER task, and huggingface add a linear classifier on the top of model. I want to know more details about classifier architecture. e.g. fully connected + softmax…
thank you for your help
hi,
I fine-tune the bert on NER task, and huggingface add a linear classifier on the top of model. I want to know more details about classifier architecture. e.g. fully connected + softmax…
thank you for your help
Hi! Can you be a little bit more specific about your query?
Just to give you a head start,
In general, NER is a sequence labeling (a.k.a token classification) problem.
The additional stuff you may have to consider for NER is, for a word that is divided into multiple tokens by bpe or sentencepiece like model, you use the first token as your reference token that you want to predict. Since all the tokens are connected via self-attention you won’t have problem not predicting the rest of the bpe tokens of a word. In PyTorch, you can ignore computing loss (see ignore_index argument) of those tokens by providing -100 as a label to those tokens (life is so easy with pytorch ).
Apart from that, I didn’t find any more additional complexity in the training NER model.
Some other implementation details you need to check,
If you still have query about the architecture you can follow this,
you only have to replace hierarchical rnn with transformer as the encoder.
You can check the following paper’s for more info,
Please let me know if you have more queries.
Thank you very much for your explanation. It let me learning a lot.
I print my model ,then I find it has a classifier. I want to know what is the architecture.
As you can see, the classifier is a single dense layer.
It is probably pointing out from here if you are using BertForSequenceClassification
, transformers/modeling_bert.py at b29eb247d39b56d903ea36c4f6c272a7bb0c0b4c · huggingface/transformers · GitHub
If you are using BertForTokenClassification
, it is pointing out here,
For setting up num_of_label, please change the variable in your config file.
hi,
can i think self.classifier = nn.Linear(config.hidden_size, config.num_labels) as a fully-connected layer.
input dimension is config.hidden_size and out dinension is config.num_labels. as shown
yes, this is just a linear layer.