Decoder vs Encoder-decoder clarification

I’m following the HF nlp course and I’m coming across a hurdle as I try to establish my ‘mental model’.

Specifically, the tutorial calls out that ‘Decoder models are suited for text generation from a prompt’ and that ‘Encoder-decoder models are suited for input summarization’. In my mind, I’m imagining working with ChatGPT where “everything is a prompt” and “everything is input”.

How does the decoder/encoder-decoder relationship compare? Isn’t ‘everything a prompt’ with regards to input into either a decoder or encoder-decoder model?

Could you clarify what you’re asking in this part of your post?

Are you asking what’s the difference between a “prompt” (the input to a decoder-only) and the input to an encoder-decoder model?

Essentially, yes. I understand the concept of decoder-only vs encoder-decoder (sort of). Is there any ‘conceptual’ difference between “prompt” and “input” or is it simply semantic in order to draw a distinction between decoder-only and encoder/decoder?

Sorry for the late response.

Is there any ‘conceptual’ difference between “prompt” and “input” or is it simply semantic in order to draw a distinction between decoder-only and encoder/decoder?

I think there are both some semantic and technical differences. When people say “prompt”, what they usually mean is something like “an input to a decoder-only model that contains instructions or examples of the task to be completed (aka a ‘few-shot prompt’).” On the other hand, the “input” to an encoder-decoder model is generally not really a “prompt”, it’s just an input that the model is supposed to do something to [1] (e.g., it’s a sentence in French and we want the model to ingest it and translate it into English). So in this sense, “prompt” vs “encoder-decoder input” is basically a semantic thing.

But there are some technical differences between the input to a decoder-only model and the input to an enc-dec model, just given the fact that they are different architectures. E.g., when you feed in a input (regardless of what you call it) to a decoder-only model, it’ll process the input with unidirectional attention (every token attends to the previous tokens), while the encoder of an enc-dec uses bidirectional attention (every input token attends to all the other tokens).

You can also think of decoder-only models as just having one big set of parameters that are used to process both the input (or prompt) and generate the output. Meanwhile, enc-dec models effectively have one set of parameters for processing the input (the encoder) and another set for generating the output (the decoder).

[1] This is a bit of a simplification though - there are instruction-tuned T5 models (flan-t5, flan-t5-ul2) which are trained to have instructions fed into the encoder. In which case, you’d probably also refer to the inputs as a “prompt”

2 Likes