Positional Embeddings in Transformer Implementations

meliksahturker · September 27, 2022, 11:04am

I have 3 questions about positional embeddings of Transformer models.

1- BART model’s positional embeddings are initialized with a value of plus 2. That means setting max_position_embeddings = 1024, results in a positional embedding matrix of shape (1026, d_model).
This is not the case for PEGASUS.

2- Why do we have seperate positional embedding layers for Encoder and Decoder? This is the case for both BART and PEGASUS, whereas PEGASUS paper in fact used sinusoidalembeddings, which is by definition, should be common for both Encoder and Decoder.

3- By default, Huggingface Transformer models are initialized to learn positional embedding as a regular Embedding layer. Is it not possible to set them to sinusoidal, perhaps with a flag such as “use_sinusoidal_embedding = True”.

HarsimarSingh · September 3, 2024, 5:09am

Nice question, and did you find the answer about this?
about 2nd point let’s take an example of translation. Encoder has to relatively learn position of tokens in sentence and decoder has to learn position of tokens in the translated language hence same embeddings will not be useful.
3rd point might be feature implementation task. But it should be overridden.

Topic		Replies	Views
Creating a tokenizer with both custom tokens and positions Beginners	5	1232	April 22, 2022
`BertEmbeddings` contains positional embedding? 🤗Transformers	2	3143	December 27, 2022
Why positional embeddings are implemented as just simple embeddings? Beginners	7	8128	October 27, 2023
Use transformer without position embeddings being added? Beginners	0	869	June 13, 2021
Positional encoding 🤗Transformers	3	226	December 16, 2024

Positional Embeddings in Transformer Implementations

Related topics