Is LLaMA rotary embedding implementation correct?

I am afraid your answer is not right. Since the \theta is not map correctly. And the key point why transformer’s RoPE only use rotate_half is that in transformers/models/llama/convert_llama_weights_to_hf.py, there exists permute function, which makes the weight rearrange along with last dimension, leading like this:

2 Likes