Hi @miguelvictor ! Both are valid strategies: iirc the original Transformers paper had sinusoidal embeddings with a fixed rate, but BERT learned a full vector for each of the 512 expected positions.
Currently, the Transformers library has sinusoidal embeddings in the TransfoXL model, check it out!