EncoderDecoderModel converts classifier layer of decoder

Tubbe · October 25, 2021, 8:41pm

I am trying to do named entity recognition using a Sequence-to-Sequence-model. My output is simple IOB-tags, and thus I only want to predict probabilities for 3 labels for each token (IOB).

I am trying a EncoderDecoderModel using the HuggingFace-implementation with a DistilBert as my encoder, and a BertForTokenClassification as my decoder.

First, I import my encoder and decoder:

encoder = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
encoder.save_pretrained("Encoder")

decoder = BertForTokenClassification.from_pretrained('bert-base-uncased',
                                                     num_labels=3,
                                                     output_hidden_states=False,
                                                     output_attentions=False)
decoder.save_pretrained("Decoder")
decoder

When I check my decoder model as shown, I can clearly see the linear classification layer that has out_features=3:

## sample of output:
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=768, out_features=3, bias=True)
)

However, when I combine the two models in my EncoderDecoderModel, it seems that the decoder is converted into a different kind of classifier - now with out_features as the size of my vocabulary:

bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained("./Encoder","./Decoder")
bert2bert

## sample of output:
(cls): BertOnlyMLMHead(
      (predictions): BertLMPredictionHead(
        (transform): BertPredictionHeadTransform(
          (dense): Linear(in_features=768, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        )
        (decoder): Linear(in_features=768, out_features=30522, bias=True)
      )
    )

Why is that? And how can I keep out_features = 3 in my model?

nielsr · October 26, 2021, 9:47am

The EncoderDecoderModel class is not meant to do token classification. It is meant to do text generation (like summarization, translation). Hence, the head on top of the decoder will be a language modeling head.

To do token classification, you can use any xxxForTokenClassification model in the library, such as BertForTokenClassification or RobertaForTokenClassification.

Tubbe · October 26, 2021, 11:59am

Thanks for the advice. However, am I correct to assume that with the TokenClassification structure, the predictions would not depend on each other, and a beam search would then not make sense?

Topic		Replies	Views
EncoderDecoderModel for token classification 🤗Transformers	0	193	October 29, 2022
How to train an EncoderDecoderModel with different pretrained encoder and decoder? 🤗Transformers	2	419	April 2, 2024
Using EncoderDecoderModel 🤗Transformers	4	1075	October 28, 2021
EnocederDecoder training/prediction with two tokenizers Beginners	1	786	October 22, 2024
Difference between EncoderDecoder and BertGeneration Beginners	0	221	August 12, 2023

EncoderDecoderModel converts classifier layer of decoder

Related topics