Why positional embeddings are implemented as just simple embeddings?

yjernite · August 4, 2020, 4:40pm

Hi @miguelvictor ! Both are valid strategies: iirc the original Transformers paper had sinusoidal embeddings with a fixed rate, but BERT learned a full vector for each of the 512 expected positions.

Currently, the Transformers library has sinusoidal embeddings in the TransfoXL model, check it out!

Topic		Replies	Views
`BertEmbeddings` contains positional embedding? 🤗Transformers	2	3162	December 27, 2022
Positional Embeddings in Transformer Implementations 🤗Transformers	1	1810	September 3, 2024
What are the goals in Positional Embedding methods? 🤗Transformers	2	505	March 3, 2022
Creating a tokenizer with both custom tokens and positions Beginners	5	1236	April 22, 2022
Issues with Whisper Encoder: Positional Encoding 🤗Transformers	4	1586	November 16, 2022

Why positional embeddings are implemented as just simple embeddings?

Related topics