Good question, seems to also be related to this unanswered question: RoBERTa classification (with article + sentence)
I’m pretty confused about this also. Here seem to be some relevant docs: RoBERTa
If our classification problem involves two pieces of text, should we be injecting these
<s></s>
characters between our two pieces of text? Presumably to give the model more of a hint that it is supposed to look at them (somewhat) separately?