Hello,
I have been reading the documentation of Biet model here. In the section pooler output
this is what is written
Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for the BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining
After going through the code (here)[transformers/modeling_beit.py at master 路 huggingface/transformers 路 GitHub], the pooler output is actually the mean of all hidden states and not a linear projection on the CLS
token.
Is it possible to update the documentation as it creates confusion while going through it?
Note: I thought of raising the issue in the GitHub repo but couldn鈥檛 find how to do it incase of documentation.