What is the separation token used to separate two input sequences for GPT?
Given a pretrained GPT2, I’m interested in fine tuning the task for question answering (given a string of question, and a string of answer, need to classify 1 or 0, whether it is indeed the answer to the question or not)
In BERT, the input sequences are separated with a [SEP] token, and this classification can be done by feeding in once sequence: question_text [SEP] answer_text
What is the separation token in GPT required for this? If i’m finetuning a pretrained model, it means that this separation token would not have been encountered before, so will i be able to use any token i wish for this? and the mode would just learn that to be the token