I have 3 questions about positional embeddings of Transformer models.
1- BART model’s positional embeddings are initialized with a value of plus 2. That means setting max_position_embeddings = 1024, results in a positional embedding matrix of shape (1026, d_model).
This is not the case for PEGASUS.
2- Why do we have seperate positional embedding layers for Encoder and Decoder? This is the case for both BART and PEGASUS, whereas PEGASUS paper in fact used sinusoidalembeddings, which is by definition, should be common for both Encoder and Decoder.
3- By default, Huggingface Transformer models are initialized to learn positional embedding as a regular Embedding layer. Is it not possible to set them to sinusoidal, perhaps with a flag such as “use_sinusoidal_embedding = True”.