Outputs.hidden_states[0][-1] always returns the same logit regardless of the question

Hi. I’m trying to apply “early exit” for llama3.2-vision-instruct 11B.

I learned that using :

outputs = model.generate(
    **inputs,
    max_new_tokens=300,
    output_hidden_states=True, 
    return_dict_in_generate=True,
    num_beams=1,
    do_sample=False
)

returns all the hidden states of output tokens in each layers, so I can get the early exited outputs of each layers for each tokens by :

last_early_exit = []
lm_head_matrix = model.language_model.lm_head.weight
norm_weight = model.language_model.model.norm.weight
for i, token_hidden_state in enumerate(outputs.hidden_states):
    for k in range(len(token_hidden_state)):
        hs = token_hidden_state[k] 
        mean = hs.mean()
        std = hs.std()
        normalized_hs = (hs - mean) / std
        hs = normalized_hs * norm_weight
        logit = torch.matmul(hs, lm_head_matrix.T)
        vocab_vector = torch.softmax(logit, dim=-1)
        max_prob_index = torch.argmax(vocab_vector, dim=-1)
        max_prob_index = max_prob_index.tolist()[0][0]
        early_exited_token = processor.decode(max_prob_index)
        print(f"token {i}_layer_{k} early exited token : {early_exited_token}, index : {max_prob_index}" )
        if k == len(token_hidden_state) - 1:
            last_early_exit.append(early_exited_token)

print(last_early_exit)

The question is that the above code should return the same output without early exit, since early exit for the last hidden state of each token is the final generation of model. However, I always get the wrong first token as below.

what I got :

['下', ' image', ' presents', ' a', ' close', '-up', ' view', ' of', ' an', ' Indian', ... ]
['下','2', '}', '<|eot_id|>']

expected :

['The', ' image', ' presents', ' a', ' close', '-up', ' view', ' of', ' an', ' Indian', ... ]
['{','2', '}', '<|eot_id|>']

Do you know why the first token always results in ‘下’, regardless of the question?

1 Like