What is the difference between prompt tuning and prefix tuning?

I read prompt tuning and prefix tuning are two effective mechanisms to leverage frozen language models to perform downstream tasks. What is the difference between the two and how they work really?

4 Likes

They’re really similar ideas and came out around the same time. Notably prefix tuning learns soft prompts at all layers of the model while prompt tuning only modifies the input. This kind of thing is discussed in the p-tuning line of work.

4 Likes

The introduction in this tutorial notebook has a nice explanation NeMo/Multitask_Prompt_and_PTuning.ipynb at main · NVIDIA/NeMo · GitHub

Both prompt tuning and prefix tuning are PEFT methods used for tasks like text generation or classification via prompting, and they’re commonly implemented with libraries like Hugging Face PEFT. Prompt tuning learns a few special input tokens, called soft prompts, that you stick at the very beginning of the text. During training only those soft prompts change while the model’s weights stay frozen, so it is tiny and fast. This works well when your task is close to what the model already knows because you are basically giving the model a learned hint in the input.

Prefix tuning learns small helper vectors for many layers inside the Transformer. These act like a short extra memory that is added to the attention mechanism at each layer, so they guide the model more deeply than soft prompts. It is still lightweight compared to full fine-tuning but larger than prompt tuning, and it often performs better on tougher tasks. Choose prompt tuning when you want the smallest adapter and quick task swaps, and choose prefix tuning when you want higher quality and can afford a little more size and latency.

1 Like

Prompt tuning adds “soft prompts” which are not actual words (they are learned embeddings) to the input layer only. Only these prompt embeddings are trained and the LLM stay frozen.

On the other hand prefix tuning is performed on all layers and because of that is slower, but better for complex tasks. Instead of prepending embeddings only at the input, it prepends trainable key and value vectors to each layer’s attention mechanism.

1 Like