The confusing parts of phi-3's implementation

As far as I know, phi-3’s Phi3SuScaledRotaryEmbedding is an implementation of LongRoPE. However, several aspects of the huggingface transformers’s implementation leave me perplexed.

Firstly, there’s the determination of the scale. Here, it calculates constants from two configurations, seemingly leading to only one branch being entered.

If I’m not mistaken, this should implement the non-interpolated starting tokens part in LongRoPE.

Additionally, this piece of code seems to clearly disregard coding standards. The uninitialized self.inv_freq is defined, and it appears that self.inv_freq could be initialized as self.long_inv_freq and self.short_inv_freq during the initialization phase.

Does anyone understand the deeper purpose of these implementations, or is this simply a bug?

Source:

As far as I know, phi-3’s Phi3SuScaledRotaryEmbedding is an implementation of LongRoPE(source).

If I’m not mistaken, this should implement the non-interpolated starting tokens part in LongRoPE.