Issue in the Documentation of transformers for BiET

Hello,

I have been reading the documentation of Biet model here. In the section pooler output this is what is written

Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for the BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining

After going through the code (here)[https://github.com/huggingface/transformers/blob/master/src/transformers/models/beit/modeling_beit.py#L666-L667], the pooler output is actually the mean of all hidden states and not a linear projection on the CLS token.
Is it possible to update the documentation as it creates confusion while going through it?

Note: I thought of raising the issue in the GitHub repo but couldn’t find how to do it incase of documentation.

Hi,

Thanks for reporting. Indeed, BeitModel currently returns an output of type BaseModelOutputWithPooling. This is a generic class that automatically generates the documentation for the model, defined here. However, in this case, it might be better to define a custom BeitModelOutput, that better describes the outputs of the model.

Do you mind opening a PR for this? This would mean defining a new dataclass within modeling_beit.py.

Otherwise, I’ll do it :wink:

Hello,

I raised a pull request here: Added Beit model ouput class by lumliolum · Pull Request #14133 · huggingface/transformers · GitHub

Can you tell me why the CircleCI - check code quality test is failing?