I am afraid your answer is not right. Since the \theta is not map correctly. And the key point why transformer’s RoPE only use rotate_half is that in transformers/models/llama/convert_llama_weights_to_hf.py, there exists permute function, which makes the weight rearrange along with last dimension, leading like this:
2 Likes
