Hi @Saaira,
Just found the why and how for this question.
Why:
The pipeline generally returns the first available tensor, which refers to the logits in the Llama model
Ref:
How:
Instead of using the pipeline for efficiency and neat codes,
use
model(torch.IntTensor([tokenizer(sentences)['input_ids'][0]]),return_dict=True, output_hidden_states=True)['hidden_states']
you can get the hidden states from all the layers (including the embedding layer) for each token,
you will get for the first sentence
len(embeddings['hidden_states']), embeddings['hidden_states'][0].shape
(33, torch.Size([1, 4, 4096]))