Hello, I’m reading a paper where BERT (
TFBertModel) and RoBerta (
TFRobertaModel) are used to solve a text classification task.
Going through the implementation, I noticed that each text sample is tokenized with no special tokens
Later on the outputs of the tokenizers are passed to the respective models and the pooled output is retrieved, as follows:
embedding_BERT = encoder_BERT( input_ids_BERT, token_type_ids=token_type_ids_BERT, attention_mask=attention_mask_BERT )['pooler_output']
- The authors claim to be using the
[CLS]tokens produced by both models. However, how can this be the case if the tokenizers encoded the text samples without including the special tokens?
False, does the first token of each text sample still encode knowledge about the whole sequence as it usually is the case with
- The authors actually use the pooled output, which is produced by
BertPooler. Can its output still be considered as the