How is CLS special token embedding initialized?

Just like other tokens, the CLS token is randomly initialized from a normal distribution. The only exception is the padding token, which is set to zero.

2 Likes