Is Sequence to Sequence the best model for paragraph classification?

If I wanted to summarize/classify an entire paragraph into a single output token, could an encoder+decoder model be fine tuned to produce such behavior? For example: I want to classify an email as ‘Work_Related’.

Hi @moyi-druzi, this should be possible. During training, you feed the emails into the encoder and provide the output token(s) as the labels for the decoder. E.g., you train a T5 decoder to generate something like “Work-related” or “Not work-related” and then immediately halt.

You can also use an encoder-decoder model for sequence classification by doing something like:

  • Feed the email to both the encoder and the decoder
  • In the decoder, feed the hidden states for EOS token into a classification layer
  • The classification layer predicts the class (work-related, not work-related)

BartForSequenceClassification is an example of this.

As for just answering the question in the title - no, probably not? Encoder-only models are a lot more common for sequence classification problems. Though it is possible with encoder-decoder models.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.