Llama rotary positional embeddings implementation details

Aleksis99 · January 15, 2025, 6:50pm

In the modeling_llama.py file, I can see that the rotary embeddings are applied (at line 271)after transposing the time and head dimension for keys and values (done at lines 267 and 268). In the official meta implementation (llama/llama/model.py at main · meta-llama/llama · GitHub) the rotary embeddings are applied before the transpositions. Why is this? I am asking because when I reimplement the model in Pytorch and load the weights, I get slightly different token distributions compared to when I load a llama model using the transformers library.

John6666 · January 16, 2025, 2:36am

There are also development staff who browse the forum, but I think it’s better to raise issues on github if you have any questions about the implementation of the library.

Topic		Replies	Views
Is LLaMA rotary embedding implementation correct? 🤗Transformers	7	8015	February 1, 2025
Issue with LlamaSdpaAttention Not Being Utilized 🤗Transformers	1	162	February 13, 2025
Why llama weight in huggingface need to do permute on wq/wk Beginners	3	973	January 2, 2025
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1493	June 23, 2024
How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model? 🤗Transformers	4	4404	January 20, 2024

Llama rotary positional embeddings implementation details

Related topics