This is all to mimic the original implementation of RoBERTa. So no, RoBERTa does not use sinusoidal position embeddings. That’s also why we can’t change the padding_index
for the posistion_ids
as it would break from the pretrained models.
This is all to mimic the original implementation of RoBERTa. So no, RoBERTa does not use sinusoidal position embeddings. That’s also why we can’t change the padding_index
for the posistion_ids
as it would break from the pretrained models.