Hi. I’m trying to apply “early exit” for llama3.2-vision-instruct 11B.
I learned that using :
outputs = model.generate(
**inputs,
max_new_tokens=300,
output_hidden_states=True,
return_dict_in_generate=True,
num_beams=1,
do_sample=False
)
returns all the hidden states of output tokens in each layers, so I can get the early exited outputs of each layers for each tokens by :
last_early_exit = []
lm_head_matrix = model.language_model.lm_head.weight
norm_weight = model.language_model.model.norm.weight
for i, token_hidden_state in enumerate(outputs.hidden_states):
for k in range(len(token_hidden_state)):
hs = token_hidden_state[k]
mean = hs.mean()
std = hs.std()
normalized_hs = (hs - mean) / std
hs = normalized_hs * norm_weight
logit = torch.matmul(hs, lm_head_matrix.T)
vocab_vector = torch.softmax(logit, dim=-1)
max_prob_index = torch.argmax(vocab_vector, dim=-1)
max_prob_index = max_prob_index.tolist()[0][0]
early_exited_token = processor.decode(max_prob_index)
print(f"token {i}_layer_{k} early exited token : {early_exited_token}, index : {max_prob_index}" )
if k == len(token_hidden_state) - 1:
last_early_exit.append(early_exited_token)
print(last_early_exit)
The question is that the above code should return the same output without early exit, since early exit for the last hidden state of each token is the final generation of model. However, I always get the wrong first token as below.
what I got :
['下', ' image', ' presents', ' a', ' close', '-up', ' view', ' of', ' an', ' Indian', ... ]
['下','2', '}', '<|eot_id|>']
expected :
['The', ' image', ' presents', ' a', ' close', '-up', ' view', ' of', ' an', ' Indian', ... ]
['{','2', '}', '<|eot_id|>']
Do you know why the first token always results in ‘下’, regardless of the question?