Significance of the [CLS] token

BramVanroy · September 5, 2021, 7:31am

Tbh this is a bit confusing.

This is how I like to think of the [CLS] token: a weighted average of the words such that the representation of the whole sequence is captured.

That’s the thing: it is not at all a weighted average - it is itself a special token that is pretrained and useful in fine-tuning, too.

Topic		Replies	Views
How can I implement this BERT model for sequential sentences classification using HuggingFace? Beginners	1	791	September 10, 2023
Which token vector is used for Sentiment Analysis? Beginners	2	341	February 16, 2024
Use of the authentication token Beginners	0	638	March 16, 2023
The inputs into BERT are token IDs. How do we get the corresponding input token VECTORS? Beginners	10	17669	September 15, 2022
Token Classification Model making mistake outside of training dataset Intermediate	0	461	October 30, 2021