PyTorch version

This is all to mimic the original implementation of RoBERTa. So no, RoBERTa does not use sinusoidal position embeddings. That’s also why we can’t change the padding_index for the posistion_ids as it would break from the pretrained models.