Documentation: Transformers Language Modeling Section

Hi:

Might you double check the documentation here: Transformers-Tasks-Language Modeling?

Specifically, the Tensorflow section that deals with DataCollator reads:

"You can use the end of sequence token as the padding token, and set mlm=False . This will use the inputs as labels shifted to the right by one element:

The code reads:

from transformers import DataCollatorForLanguageModeling 
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")

The code doesn’t contain the padding token.

And then:

"For masked language modeling, use the same DataCollatorForLanguageModeling except you should specify mlm_probability to randomly mask tokens each time you iterate over the data.

And the code reads:

from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")

The code has mlm=False.

Also, the notebook associated uses the default DataCollator in the Causal Language Modeling Section. But elsewhere in the documentation (Course: Training a Causal Language Model from Scratch), it reads: By default [ DataCollatorForLanguageModeling] prepares data for MLM…

Thanks!