GPT2: hidden states get by output_hidden_states is different from those by register_forward_hook

Hi, I want to output the 20th GPT2Block in a GPT2 medium model (24 GPT2Block blocks in total). I have used register_forward_hook and output_hidden_state separately, but they give different results.

My code is as follows:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig

model_config = AutoConfig.from_pretrained('gpt2-medium', output_hidden_states=True,  return_dict_in_generate=True)
model = AutoModelForCausalLM.from_pretrained('gpt2-medium', config=model_config).cuda()

# define hook
def hook(module, fea_in, fea_out):  # collect output
    nonlocal features_in_hook
    features_in_hook = fea_out.clone().detach()
    return fea_out

model.eval()  # turn dropout and layernorm into eval mode
features_in_hook = None  # to save output
for (name, module) in model.named_modules():
    if name == 'transformer.h.19.mlp.dropout':
        h = module.register_forward_hook(hook=hook)

prompt_tok = tok(["Who are you?", "What university are you in?"], padding=True, return_tensors="pt").to("cuda")
hidden_state = model(**prompt_tok)[2][20]

If I have done everything right, hidden_state should be the same as features_in_hook. Since the length of model(**prompt_tok)[2] is 25 and the first is word embedding, the output of the 20th block should be in index 20. However, the two results obtained in different ways are not the same. Have I done something wrong?

1 Like