Caching computations for common prefix in prompts

arpesenti · June 29, 2023, 2:19pm

I want to perform zero-shot text classification using a large language model, with prompt engineering/in-context learning. The first part of the prompt will always be the same: it will contain the task description and the list of possible categories. The second part of the prompt instead will be variable depending on the actual sentence to classify. Instead of feeding the whole prompt to the model every time is it possible to cache some partial computation on the first fixed part of the prompt?

OfirArviv · July 11, 2023, 11:07am

Would love to get some pointing to the right direction as well

Topic		Replies	Views
How to cache common instruction prompt 🤗Transformers	16	2344	October 31, 2024
Prompt caching in pipelines 🤗Transformers	1	56	May 27, 2025
Past_key_value with multiple new tokens Intermediate	1	1343	August 10, 2023
LLM Zero shot-text classification - How do you answer multiple questions computationally efficiently? 🤗Transformers	0	1598	December 8, 2023
How is the prompt + answer handled during training Beginners	0	112	March 20, 2024

Caching computations for common prefix in prompts

Related topics