BERT positional encoding max is 512 tokens.
If I have data which is 512 tokens size and make pair of it for making pair similarity check.
BERT paper says input would be [cls] first sequence [sep] second sequence.
that means first sequence + second sequence combined should be not over 512.
then one sequence would be 256 sequences only.
My question is if I have data which is 512 tokens, and wanna make pair similiarity classifier
then what should I do?