How to train causal language model

MarkWuICT · January 18, 2024, 7:01am

I found an example in [transformers/examples/pytorch/language-modeling/run_clm.py]
According to the script, the trainer use dafault_data_collator to model causal language modelling.(transformers/examples/pytorch/language-modeling/run_clm.py at 98dda8ed03ac3f4af5733bdddaa1dab6a81e15c1 · huggingface/transformers · GitHub)
Shouldn’t we use DataCollatorForLanguageModeling to shift input and output by 1 token instead? It seems that dafault_data_collator can’t achieve this goal.

Topic		Replies	Views
Error in DataCollator section of Hugging Face Tutorial LM fine tuning Beginners	2	258	January 12, 2024
Documentation: Transformers Language Modeling Section Beginners	0	325	May 14, 2022
Where does the Transformers do the target text shifting in causal LM? Beginners	4	4769	February 21, 2025
Data Preparation for CausalLM 🤗Transformers	1	1262	March 16, 2023
How is the data shifted by one token during CausalLM fine tuning Models	4	3164	April 14, 2025