What is the difference between prompt tuning and prefix tuning?

neo-benjamin · November 30, 2022, 1:11am

I read prompt tuning and prefix tuning are two effective mechanisms to leverage frozen language models to perform downstream tasks. What is the difference between the two and how they work really?

jxm · February 6, 2023, 9:51pm

They’re really similar ideas and came out around the same time. Notably prefix tuning learns soft prompts at all layers of the model while prompt tuning only modifies the input. This kind of thing is discussed in the p-tuning line of work.

vadams · March 8, 2023, 11:41pm

The introduction in this tutorial notebook has a nice explanation NeMo/Multitask_Prompt_and_PTuning.ipynb at main · NVIDIA/NeMo · GitHub

kbigdelysh · October 21, 2025, 6:36pm

Both prompt tuning and prefix tuning are PEFT methods used for tasks like text generation or classification via prompting, and they’re commonly implemented with libraries like Hugging Face PEFT. Prompt tuning learns a few special input tokens, called soft prompts, that you stick at the very beginning of the text. During training only those soft prompts change while the model’s weights stay frozen, so it is tiny and fast. This works well when your task is close to what the model already knows because you are basically giving the model a learned hint in the input.

Prefix tuning learns small helper vectors for many layers inside the Transformer. These act like a short extra memory that is added to the attention mechanism at each layer, so they guide the model more deeply than soft prompts. It is still lightweight compared to full fine-tuning but larger than prompt tuning, and it often performs better on tougher tasks. Choose prompt tuning when you want the smallest adapter and quick task swaps, and choose prefix tuning when you want higher quality and can afford a little more size and latency.

datadata123 · October 23, 2025, 2:08pm

Prompt tuning adds “soft prompts” which are not actual words (they are learned embeddings) to the input layer only. Only these prompt embeddings are trained and the LLM stay frozen.

On the other hand prefix tuning is performed on all layers and because of that is slower, but better for complex tasks. Instead of prepending embeddings only at the input, it prepends trainable key and value vectors to each layer’s attention mechanism.

Topic		Replies	Views
Prompt Tuning For Sequence Classification Models	5	2163	December 19, 2023
Is this correct approach to do Prompt Tuning on DollyV2 model 🤗Transformers	0	599	May 9, 2023
Prompt Tuning for Sequence Classification using PEFT Models	0	149	January 17, 2024
How to use PEFT approach to do Prompt Tuning on DollyV2 model 🤗Transformers	0	769	May 4, 2023
Combine between lora and prompt tunning 🤗Transformers	1	949	February 3, 2024

What is the difference between prompt tuning and prefix tuning?

Related topics