Fine-tuning a masked language model

rs15 · February 2, 2022, 2:31pm

Hi everyone.

I was going through the * Fine-tuning a masked language model section in the course and I can’t understand one thing there. I’m talking about the following piece of code:

concatenated_examples = {
    k: sum(tokenized_samples[k], []) for k in tokenized_samples.keys()
}
total_length = len(concatenated_examples["input_ids"])
print(f"'>>> Concatenated reviews length: {total_length}'")

From what I know, BERT can only have one [CLS] token at the beginning. So, if we concat all these texts and split them into chunks, we will get multiple [CLS] tokens (which is not BERT-like)

[CLS] ... [SEP] [CLS] ... [SEP] ...

in a sequence. Why does it work? Is there any paper that describes this behavior or maybe any other source?

Thank you.

Topic		Replies	Views
Fine-tuning BERT with deterministic masking instead of random masking Beginners	0	165	April 22, 2024
Batched BertForMaskedLM inference loss issue Intermediate	0	690	February 23, 2022
Multiple Mask Tokens 🤗Transformers	4	7487	February 12, 2022
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021
SpanBERT, ELECTRA, MARGE from scratch? Beginners	5	1383	July 22, 2023

Fine-tuning a masked language model

Related topics