Is LLaMA rotary embedding implementation correct?

@reminisce I think you’re correct. I saw the same thing. The embeddings are supposed to be interweaved. In this manner, the embeddings have an identical form at dim 0 that they would at dim x.shape[-1]//2.
(screenshot below shows what I’m referring to in poor detail - it shows dummy data (torch.arange(0, 256).unsqueeze(0).repeat(16, 1) which is why there’s the smooth background color change from 0 to 256) that is encoded with this rotary PE code)

(Note: I haven’t rigorously examined this test, but this simple examination raises my concern that it’s not correct)