Could someone explain the difference between encoder-only and encoder-decoder models in the context of question answering?

What is the difference between two types of transformer models (encoder-only vs encoder-decoder) and how they are used for answering questions?

1 Like

Encoder-only models are well-suited for extractive question answering tasks, where the model identifies the most relevant span of text as the answer.
Encoder-decoder models are better for open-ended questions like “Why is the sky blue?” — they can synthesize information and produce a natural, explanatory answer. The encoder builds a representation of the input, and the decoder generates new content based on that representation.

For more information, check out:

1 Like

The encoder-only transformer model processes the input from left to right and from right to left, so they know contexy around the word. BERT, RoBERTa are models like this. For question answering the might be used for extractive question answering, eg. we heave context: “…In 1928, Alexander Fleming discovered penicillin…” and question: “Who discovered penicillin?”.

For encoder-decoder encoder reads the input and then the decoder produces output token by token. It is better for answering generative questions, so it can handle more abstract QA, for example summarize, rephrase informations.

1 Like