How is CLS special token embedding initialized?

sgavela · March 16, 2022, 11:07am

Hi! Does anyone know how CLS token is initialized in BERT? I mean, let’s say I would like to train a BERT model from scratch (which of course I’m not doing), how should I initialize CLS embedding? Just at random under some distribution such as uniform? How is this done in BERT?

BramVanroy · March 16, 2022, 3:46pm

Just like other tokens, the CLS token is randomly initialized from a normal distribution. The only exception is the padding token, which is set to zero.

github.com

huggingface/transformers/blob/204c54d411c2b4c7f31405203533a51632f46ab1/src/transformers/models/bert/modeling_bert.py#L731-L734

      
        
            elif isinstance(module, nn.Embedding):
                module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
                if module.padding_idx is not None:
                    module.weight.data[module.padding_idx].zero_()

Topic		Replies	Views
How to add a new special token and initialize its embeddings to random values? Beginners	0	275	October 19, 2022
DistilBERT and CLS token Beginners	2	2444	February 21, 2021
Should cls_token be [CLS] or <cls>? 🤗Tokenizers	3	276	October 11, 2023
Identical CLS token embeddings for all different sentences? Beginners	1	451	April 17, 2023
Special tokens with inputs_embeds input Beginners	0	260	July 10, 2023

How is CLS special token embedding initialized?

Related topics