PyTorch version

Hi there!
It’s hard to know exactly what’s going on without seeing your code but here is what I can share about RoBERTa. You should not use max_position_embeddings as a maximum sequence length. Because the position IDs of RoBERTa go from padding_index to maximum_sequence_length + padding_index, this max_position_embeddings is purposely set to 514 (2, the padding index + 512, the maximum sequence length). You should use tokenizer.model_max_length instead (which should be 512).

1 Like