I am trying to do named entity recognition using a Sequence-to-Sequence-model. My output is simple IOB-tags, and thus I only want to predict probabilities for 3 labels for each token (IOB).
I am trying a EncoderDecoderModel using the HuggingFace-implementation with a DistilBert as my encoder, and a BertForTokenClassification as my decoder.
First, I import my encoder and decoder:
encoder = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") encoder.save_pretrained("Encoder") decoder = BertForTokenClassification.from_pretrained('bert-base-uncased', num_labels=3, output_hidden_states=False, output_attentions=False) decoder.save_pretrained("Decoder") decoder
When I check my decoder model as shown, I can clearly see the linear classification layer that has out_features=3:
## sample of output: ) (dropout): Dropout(p=0.1, inplace=False) (classifier): Linear(in_features=768, out_features=3, bias=True) )
However, when I combine the two models in my EncoderDecoderModel, it seems that the decoder is converted into a different kind of classifier - now with out_features as the size of my vocabulary:
bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained("./Encoder","./Decoder") bert2bert ## sample of output: (cls): BertOnlyMLMHead( (predictions): BertLMPredictionHead( (transform): BertPredictionHeadTransform( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) ) (decoder): Linear(in_features=768, out_features=30522, bias=True) ) )
Why is that? And how can I keep out_features = 3 in my model?