Always only a single Linear layer as the classification head?

For example, in the class BertForSequenceClassification definition, only one Linear layer is used as the classifier. If just one Linear layer is used, doesn’t it just do linear project for pooled_out? Will such a classifier produce good predictions? Why not use multiple Linear layers? Does transformers offer any option for using multiple Linear layers as the classification head?