Why positional embeddings are implemented as just simple embeddings?

yjernite · August 14, 2020, 4:12pm

In theory, the trigonometric functions have the ability to generalize beyond positions that are seen at training time. They also allow the model to rely on relative rather than absolute positions, and as such their dot product can be computed more efficiently as shown in the TransformerXL paper.

On the other hand, the learned index embeddings offer more parameters, which might enable the model to learn faster in some situations.

As for many other things, it really depends on your use case

Topic		Replies	Views
`BertEmbeddings` contains positional embedding? 🤗Transformers	2	3123	December 27, 2022
Positional Embeddings in Transformer Implementations 🤗Transformers	1	1773	September 3, 2024
How to use custom positional embedding while fine tuning Bert Beginners	2	2778	September 14, 2022
What are the goals in Positional Embedding methods? 🤗Transformers	2	502	March 3, 2022
Creating a tokenizer with both custom tokens and positions Beginners	5	1228	April 22, 2022

Why positional embeddings are implemented as just simple embeddings?

Related topics