Past_key_value with multiple new tokens

Imnimo · August 2, 2023, 4:58pm

I have a GPT-style model which I’m using to generate text for a set of prompts. Each prompt has a shared prefix, followed by a variable suffix. For example, something like:

Prompt 1: “A B C X Y”
Prompt 2: “A B C W V”
Prompt 3: “A B C Q R S”

A simple way of processing Prompt 1 using key-value caching is like this:

abcxy_tokens = tokenize("A B C X Y")
result = model(abcxy_tokens,use_cache=True)
past_key_values = result['past_key_values']
predicted_token = argmax(result['logits'])
result = model(predicted_token,use_cache=True,past_key_values=past_key_values)
...(repeat)

This works great for me, but I would like to save more compute by caching the key-values for the shared prefix, “A B C”. I want to do something like this:

abc_tokens = tokenize("A B C")
prefix_key_values = model(abcxy_tokens,use_cache=True)['past_key_values']
#prefix_key_values can now be re-used for prompt 1, prompt 2, etc.

prompt_tokens = tokenize("X Y")
result = model(prompt_tokens,use_cache=True,past_key_values=prefix_key_values)
...

However, when I try this, I get an error complaining about a shape mismatch. The code only works if there is a single new token.

I don’t think that there’s any fundamental reason why key-value caching would not be possible with multiple new tokens. Does the existing implementation silently assume a single new token? Is there an alternate method I should be using for this scenario?

dblakely · August 10, 2023, 4:47pm

This should be possible and in fact I’ve done something very similar before. Could you share the error message, stack trace, and which Huggingface model class you’re using?

Topic		Replies	Views
Is There a Way to Improve Memory Usage When Using Identical `past_key_values` for All Samples in a Batch? 🤗Transformers	3	388	October 21, 2024
Why past_key_values is not in GreedySearchDecoderOnlyOutput? 🤗Transformers	1	2017	October 4, 2022
Efficient batch inference using stacked past_key_values for multiple continuation candidates Models	1	23	June 10, 2025
Outputs change if re-using KVCache (past_key_values) for model.forward and generation 🤗Transformers	5	194	January 22, 2025
Past_key_values - why not past_key_values_queries? Beginners	5	10950	October 15, 2023

Past_key_value with multiple new tokens

Related topics