Hi there!
It’s hard to know exactly what’s going on without seeing your code but here is what I can share about RoBERTa. You should not use max_position_embeddings
as a maximum sequence length. Because the position IDs of RoBERTa go from padding_index
to maximum_sequence_length + padding_index
, this max_position_embeddings
is purposely set to 514 (2, the padding index + 512, the maximum sequence length). You should use tokenizer.model_max_length
instead (which should be 512).
1 Like