Electra relative position embedding ("relative_key_query")


if we use the approach mentioned in the paper “Improve Transformer Models with Better Relative Position Embeddings” we could theoretically expand the model in lengths of 2048 tokens given there are no absolute embeddings and that the -k and k weights of the window can be duplicated to an arbitary length. Is my assumption correct?

Thank you