PyTorch version

david-waterworth · December 16, 2020, 8:45am

Yes that’s what I worked out, I’m not sure why the position ID’s go from padding_index to maximum_sequence_length + padding_index though (I actually think it starts at padding_index + 1)? You could set the padding_index of the position_ids embedding to 0 (i.e. different to the padding index of the input_ids) and as far as I can see it still works fine since the actual indices are always > 0 and there’s no chance of an index out of range (which doesn’t produce a meaningful exception from CUDA)

Also did the original Bert / RoBERTa not use sinusoidal position embeddings? Or was that a later addition?

Topic		Replies	Views
Positional encoding error in RoBERTa 🤗Transformers	1	338	October 2, 2023
Positional Encoding error, Protein Bert Model Intermediate	2	654	October 25, 2020
Claritifcation about the `max_position_embeddings` argument 🤗Transformers	1	504	January 27, 2023
Error using `max_length` in transformers 🤗Transformers	3	2705	February 26, 2021
Different size of Roberta-base tokenizer and model embedding Beginners	1	1134	March 1, 2022

PyTorch version

Related topics