Get each generated token last layer hidden state

I am using llama2 casual model

        multimodal_embeddings, multimodal_attention_mask = self._build_multimodal_attention(
            input_embeddings, projected_patch_embeddings, attention_mask
self.language_model.generate(inputs_embeds=multimodal_embeddings,max_new_tokens=8,output_hidden_states=True,return_dict_in_generate=True)

I want to get each generated token last layer hidden state.
But I don’t know whether language_model_output.hidden_states[0][-1]. is the first generated token hidden state because it is different:

language_model_output.hidden_states[0][-1].shape
torch.Size([1, 535, 4096]) # why it is same with multimodal_embeddings.shape, not 1 
(Pdb) language_model_output.hidden_states[1][-1].shape
torch.Size([1, 1, 4096])
(Pdb) multimodal_embeddings.shape
torch.Size([1, 535, 4096]) 
1 Like

Same problem as Wrong shape of hidden_states and attentions when generating · Issue #26174 · huggingface/transformers · GitHub
From this, we may need generate one more token? Wrong shape of last layer hidden states when generating · Issue #30036 · huggingface/transformers · GitHub

1 Like

Hmm, that’s difficult…

Why it is difficult?
language_model_output.hidden_states[1][-1].shape torch.Size([1, 1, 4096]) I think it is hidden state, right?
Can we grep the first token hidden state by language_model_output.hidden_states[0][-1][:,-1,:] ?

1 Like