Embeddings from llama2

Hi @Saaira,

Just found the why and how for this question.

Why:
The pipeline generally returns the first available tensor, which refers to the logits in the Llama model
Ref:

  1. pipeline source code
  2. Llama doc

How:
Instead of using the pipeline for efficiency and neat codes,
use

model(torch.IntTensor([tokenizer(sentences)['input_ids'][0]]),return_dict=True, output_hidden_states=True)['hidden_states']

you can get the hidden states from all the layers (including the embedding layer) for each token,
you will get for the first sentence

len(embeddings['hidden_states']), embeddings['hidden_states'][0].shape
(33, torch.Size([1, 4, 4096]))