Caching computations for common prefix in prompts

I want to perform zero-shot text classification using a large language model, with prompt engineering/in-context learning. The first part of the prompt will always be the same: it will contain the task description and the list of possible categories. The second part of the prompt instead will be variable depending on the actual sentence to classify. Instead of feeding the whole prompt to the model every time is it possible to cache some partial computation on the first fixed part of the prompt?

1 Like

Would love to get some pointing to the right direction as well