How to train causal language model

I found an example in [transformers/examples/pytorch/language-modeling/]
According to the script, the trainer use dafault_data_collator to model causal language modelling.(transformers/examples/pytorch/language-modeling/ at 98dda8ed03ac3f4af5733bdddaa1dab6a81e15c1 路 huggingface/transformers 路 GitHub)
Shouldn鈥檛 we use DataCollatorForLanguageModeling to shift input and output by 1 token instead? It seems that dafault_data_collator can鈥檛 achieve this goal.