Embeddings from llama2

Hi @Saaira,

Just found the why and how for this question.

The pipeline generally returns the first available tensor, which refers to the logits in the Llama model

  1. pipeline source code
  2. Llama doc

Instead of using the pipeline for efficiency and neat codes,

model(torch.IntTensor([tokenizer(sentences)['input_ids'][0]]),return_dict=True, output_hidden_states=True)['hidden_states']

you can get the hidden states from all the layers (including the embedding layer) for each token,
you will get for the first sentence

len(embeddings['hidden_states']), embeddings['hidden_states'][0].shape
(33, torch.Size([1, 4, 4096]))