I have a question about DataCollatorForLanguageModeling, when training a LM.
I saw this video, which explains very well, how the training process works.
Starting from minute 5:12 it says “the datacollator shifts the input, such that the label is the next token in the sequence for every single token in the input”
It make sense to me and it is a nice explanation about what is happening behind the scene.
But then, looking into the documentation for understanding the mlm parameter I found the following:
mlm (bool
, optional, defaults to True
) — Whether or not to use masked language modeling. If set to False
, the labels are the same as the inputs with the padding tokens ignored (by setting them to -100). Otherwise, the labels are -100 for non-masked tokens and the value to predict for the masked token.
So, now I’m totally confused. Is DataCollator shifting the tokens to the left? Or it is controlling only the behavior of the padding tokens?
Thanks