Is LLaMA rotary embedding implementation correct?

cc007 · February 1, 2025, 1:06pm

I am afraid your answer is not right. Since the \theta is not map correctly. And the key point why transformer’s RoPE only use rotate_half is that in transformers/models/llama/convert_llama_weights_to_hf.py, there exists permute function, which makes the weight rearrange along with last dimension, leading like this:

Topic		Replies	Views
Llama rotary positional embeddings implementation details Beginners	1	311	January 16, 2025
Why llama weight in huggingface need to do permute on wq/wk Beginners	3	1022	January 2, 2025
Vision Transformer embeddings interpolation 🤗Transformers	0	379	July 6, 2022
Relative Position Representation/Encoding for Transformer Research	0	1946	February 22, 2022
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3666	January 6, 2021

Is LLaMA rotary embedding implementation correct?

Related topics