Here is my code:
from transformers import MistralForCausalLM, AutoTokenizer, GenerationConfig
model = MistralForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
prompt = "[INST] What's your name? [/INST]"
generate_config = GenerationConfig(
temperature=1,
top_p = 0.75,
top_k=40,
num_beams=4,
output_hidden_states=True
)
input_ids = tokenizer(prompt)['input_ids']
generation_output = model.generate(
input_ids,
generation_config=generate_config,
return_dict_in_generate=True,
)
print(len(generation_output.hidden_states))
I am looking to retrieve the final layer hidden states for every token, which includes those in the input sequence, while excluding the language model head (lm_head). How can I go about this?