I am trying to do named entity recognition using a Sequence-to-Sequence-model. My output is simple IOB-tags, and thus I only want to predict probabilities for 3 labels for each token (IOB).
I am trying a EncoderDecoderModel using the HuggingFace-implementation with a DistilBert as my encoder, and a BertForTokenClassification as my decoder.
First, I import my encoder and decoder:
encoder = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
encoder.save_pretrained("Encoder")
decoder = BertForTokenClassification.from_pretrained('bert-base-uncased',
num_labels=3,
output_hidden_states=False,
output_attentions=False)
decoder.save_pretrained("Decoder")
decoder
When I check my decoder model as shown, I can clearly see the linear classification layer that has out_features=3:
## sample of output:
)
(dropout): Dropout(p=0.1, inplace=False)
(classifier): Linear(in_features=768, out_features=3, bias=True)
)
However, when I combine the two models in my EncoderDecoderModel, it seems that the decoder is converted into a different kind of classifier - now with out_features as the size of my vocabulary:
bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained("./Encoder","./Decoder")
bert2bert
## sample of output:
(cls): BertOnlyMLMHead(
(predictions): BertLMPredictionHead(
(transform): BertPredictionHeadTransform(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(decoder): Linear(in_features=768, out_features=30522, bias=True)
)
)
Why is that? And how can I keep out_features = 3 in my model?