Decoder vs Encoder-decoder clarification

Sorry for the late response.

Is there any ‘conceptual’ difference between “prompt” and “input” or is it simply semantic in order to draw a distinction between decoder-only and encoder/decoder?

I think there are both some semantic and technical differences. When people say “prompt”, what they usually mean is something like “an input to a decoder-only model that contains instructions or examples of the task to be completed (aka a ‘few-shot prompt’).” On the other hand, the “input” to an encoder-decoder model is generally not really a “prompt”, it’s just an input that the model is supposed to do something to [1] (e.g., it’s a sentence in French and we want the model to ingest it and translate it into English). So in this sense, “prompt” vs “encoder-decoder input” is basically a semantic thing.

But there are some technical differences between the input to a decoder-only model and the input to an enc-dec model, just given the fact that they are different architectures. E.g., when you feed in a input (regardless of what you call it) to a decoder-only model, it’ll process the input with unidirectional attention (every token attends to the previous tokens), while the encoder of an enc-dec uses bidirectional attention (every input token attends to all the other tokens).

You can also think of decoder-only models as just having one big set of parameters that are used to process both the input (or prompt) and generate the output. Meanwhile, enc-dec models effectively have one set of parameters for processing the input (the encoder) and another set for generating the output (the decoder).

[1] This is a bit of a simplification though - there are instruction-tuned T5 models (flan-t5, flan-t5-ul2) which are trained to have instructions fed into the encoder. In which case, you’d probably also refer to the inputs as a “prompt”

3 Likes