Why if use cache in gpt2 model from transformers , the logits are different if i do a forward pass from scratch

juanka0357 · October 24, 2023, 1:55pm

im trying to use past_key_values for speed up the inference

import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

torch.set_default_device("cuda")
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()
model.to("cuda")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
seq = torch.tensor([1, 2, 3, 4, 5])
original_out = model(input_ids=seq).logits
seq2 = torch.tensor([1, 2, 3])
key_values = model(input_ids=seq2, use_cache=True).past_key_values
new_seq = torch.tensor([4, 5])
magic = model(input_ids=new_seq, past_key_values=key_values).logits
print(torch.equal(original_out[-1, :], magic[-1, :]))

but this returns false

i expected return true

nqgl · February 25, 2024, 8:10am

Hey, all that is the issue here is you used torch.equal
floating points get little errors in them, and should not be considered perfectly deterministic. This is expected, normal, and usually totally fine.
However it does mean strict equality checks will often fail when you would want them to pass.
For this reason torch.allclose exists. you may need to set atol and rtol high in some circumstances but here it works fine. torch.allclose just check that all the values are close.

trying your code with allclose, it returns true

Topic		Replies	Views
GPT-2 Forward w/ and w/o caching of past gives different results Beginners	0	421	May 31, 2022
Use_cache (and past_key_values) in GPT2 leads to slower inference? 🤗Transformers	1	1047	April 9, 2023
Need help with gpt2 model Beginners	0	585	July 9, 2023
Understanding GPT-2 logits 🤗Transformers	0	59	December 5, 2024
Outputs change if re-using KVCache (past_key_values) for model.forward and generation 🤗Transformers	5	190	January 22, 2025

Why if use cache in gpt2 model from transformers , the logits are different if i do a forward pass from scratch

Related topics