Hello there,
In BERT
it is mentioned that the classifiaction head is “(a linear layer on top of the pooled output)”.
I understand that the pooled output is among all the outputs and not just the CLS token (or else it wouldnt be pooled I guess).
I would like some confirmation on this because it is not 100% clear for readers.
Thanks in advance.