Encoder-Decoder vs Decoder Only Architecture Models

Transformers originally started with the Encoder-Decoder models for solving the machine translation tasks. Since then Decoder only transformer models have emerged as strong contenders for 1) translation 2) better generalization for downstream tasks 3) host of application from classification to translation to generation.

  1. When should we consider a encoder-decoder style architecture vs a decoder only architecture?
  2. In what cases can a encoder-decoder architecture outperform a decoder only architecture?


1 Like