How to calculate embeddings with Llama-2 model


I would like to calculate embeddings using a Llama-2 model and HuggingFaceEmbedding embedding class:

from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="meta-llama/Llama-2-7b-chat-hf")
embeddings = embed_model.get_text_embedding("Hello World!")

But I get the following exception which I dont know how to bypass:

Using pad_token, but it is not set yet.
Traceback (most recent call last):
  File "/home/ttpuser/chatgpt/embeddings-test/", line 4, in <module>
    embeddings = embed_model.get_text_embedding("Hello World!")
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/llama_index/embeddings/", line 185, in get_text_embedding
    text_embedding = self._get_text_embedding(text)
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/llama_index/embeddings/", line 184, in _get_text_embedding
    return self._embed([text])[0]
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/llama_index/embeddings/", line 146, in _embed
    encoded_input = self._tokenizer(
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/transformers/", line 2602, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/transformers/", line 2688, in _call_one
    return self.batch_encode_plus(
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/transformers/", line 2870, in batch_encode_plus
    padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
  File "/home/ttpuser/.pyenv/versions/3.11.5/lib/python3.11/site-packages/transformers/", line 2507, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

Does someone knows how to go around this?
Thank you a lot!

I tried the following with another model and got past the error but the embeddings were always the same no matter what the text.

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('meta-llama/Llama-2-7b-chat-hf')
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf')
tokenizer.pad_token = tokenizer.eos_token
embed_model = HuggingFaceEmbedding(model=model, model_name='meta-llama/Llama-2-7b-chat-hf', tokenizer=tokenizer, tokenizer_name='meta-llama/Llama-2-7b-chat-hf')

I don’t know if his helps but try using sentence - transformer for embedding plus its fast and lightweight , it works really well , I too tried generating embeddings with llama 2 but failed , but sentence - transformer’s all-MiniLM-L12-v2 worked just as good as I had hoped I needed.
u can run it on collab too , no additional resources gpu etc.

Hi @gasparuben , AnglE-LLaMA is a good choice to generate LLaMA embedding. It has achieved state-of-the-art performance on the STS benchmark.

GitHub: GitHub - SeanLee97/AnglE: Angle-optimized Text Embeddings | 🔥 New SOTA
HF: SeanLee97/angle-llama-7b-nli-v2 · Hugging Face


python -m pip install -U angle-emb
from angle_emb import AnglE, Prompts

# init
angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2')

# set prompt
print('All predefined prompts:', Prompts.list_prompts())
print('prompt:', angle.prompt)

# encode text
vec = angle.encode({'text': 'hello world'}, to_numpy=True)
vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True)