TFBertModel for classification task with no CLS token

Hello, I’m reading a paper where BERT (TFBertModel) and RoBerta (TFRobertaModel) are used to solve a text classification task.

Going through the implementation, I noticed that each text sample is tokenized with no special tokens tokenizer.encode(sentence, add_special_tokens=False).

Later on the outputs of the tokenizers are passed to the respective models and the pooled output is retrieved, as follows:

embedding_BERT = encoder_BERT(


  1. The authors claim to be using the [CLS] tokens produced by both models. However, how can this be the case if the tokenizers encoded the text samples without including the special tokens?
  2. If add_special_tokens is False, does the first token of each text sample still encode knowledge about the whole sequence as it usually is the case with [CLS]?
  3. The authors actually use the pooled output, which is produced by BertPooler. Can its output still be considered as the CLS token?

code: PoliticES2022/PoliticES.ipynb at main · ssantamaria94/PoliticES2022 · GitHub