Positional Embeddings in Transformer Implementations

Nice question, and did you find the answer about this?
about 2nd point let’s take an example of translation. Encoder has to relatively learn position of tokens in sentence and decoder has to learn position of tokens in the translated language hence same embeddings will not be useful.
3rd point might be feature implementation task. But it should be overridden.