Fine-tuning a masked language model

Hi everyone.

I was going through the * Fine-tuning a masked language model section in the course and I can’t understand one thing there. I’m talking about the following piece of code:

concatenated_examples = {
    k: sum(tokenized_samples[k], []) for k in tokenized_samples.keys()
total_length = len(concatenated_examples["input_ids"])
print(f"'>>> Concatenated reviews length: {total_length}'")

From what I know, BERT can only have one [CLS] token at the beginning. So, if we concat all these texts and split them into chunks, we will get multiple [CLS] tokens (which is not BERT-like)

[CLS] ... [SEP] [CLS] ... [SEP] ...

in a sequence. Why does it work? Is there any paper that describes this behavior or maybe any other source?

Thank you.