Does it make sense to use CLS token on RoBERTa based models?

bunoviske · October 20, 2020, 11:02am

Hello,

I know that some transformer models are not pre trained with Next Sentence Prediction objective, like RoBERTa based models. In that case, CLS token does not mean anything, right?

Given CLS token is not pretrained, when I am developing my downstream classification task, would be better to fine tune this CLS token or perform average pooling on all the tokens?

Thanks in advance,
Bruno

rishtrip · March 30, 2021, 12:30pm

Hi, the importance of [CLS] token is not only limited to NSP (Next Sequence Prediction) tasks. As far as I understand its importance and its functioning, you can use it for fine-tuning in other tasks too, because [CLS] token is that special token that attends all other tokens in the sequence so it has a representation explaining the knowledge from the context explained in the sequence.
Extending it to NSP task, it learns the representation through self-attention looking around at all the tokens in the context (from both input pair sequence).

bunoviske · March 30, 2021, 5:09pm

My doubt is how the CLS token learns the context in the sequence if it is not used during the pre training (like RoBERTa based models).

If I understood correctly, this CLS token learns only during the finetunning, which is also good enough for text classification.

Topic		Replies	Views
TFBertModel for classification task with no CLS token Beginners	0	344	March 11, 2023
Why we need to add special tokens to tasks other than classification? 🤗Tokenizers	0	868	November 17, 2021
On using the final [CLS] hidden state of RoBERTa Beginners	2	3004	November 9, 2023
Common practice, using the hidden state associated with [cls] as an input feature for a classification task? Intermediate	3	5663	January 31, 2024
Fine tune a saved model with custom tokenizer 🤗Transformers	3	2959	December 15, 2020

Does it make sense to use CLS token on RoBERTa based models?

Related topics