Which token vector is used for Sentiment Analysis?

I see in most of Sentiment Analysis tasks which are implemented based on BERT, only the embedding of [CLS] is passed to classifier, while others are useless. What is the reason behind it?

According to the paper, BERT’s [CLS] token aggregates the hidden states of the other tokens, which renders them “useless” for sequence classification tasks, as all relevant info is already pooled into [CLS].

1 Like

Can you please elaborate more on it? I see the transformers/BERT layers with Token Embeddings, Segment Embeddings, Position Embeddings available along with CLS, SEP