I read prompt tuning and prefix tuning are two effective mechanisms to leverage frozen language models to perform downstream tasks. What is the difference between the two and how they work really?
They’re really similar ideas and came out around the same time. Notably prefix tuning learns soft prompts at all layers of the model while prompt tuning only modifies the input. This kind of thing is discussed in the p-tuning line of work.
The introduction in this tutorial notebook has a nice explanation NeMo/Multitask_Prompt_and_PTuning.ipynb at main · NVIDIA/NeMo · GitHub