Using Data Collator For Completion Only LM requires more memory

NofarSachs · July 8, 2024, 1:54pm

Hello,
I’m trying to use PHILSCHMID blog: Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora
to fine-tune llama3 8b for completions only, ignoring prompts.
I’m using trl’s DataCollatorForCompletionOnlyLM, as explained here: Supervised Fine-tuning Trainer

For some reason, I get Cuda out-of-memory errors with a much smaller batch size than when I used the same script without the data collator.

Does it have to do with the packing=False argument passed to the trainer? because it seems like the training is much more memory-consuming.

Thank you for the clarification,
Nofar

Topic		Replies	Views
Slower train with collator for completion only 🤗Transformers	1	1226	April 7, 2024
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat 🤗Transformers	2	718	May 25, 2025
TRL SFT super prone to nan when using data collator Intermediate	2	1330	April 27, 2024
Tokenizer causes TRL completion data collator failure Intermediate	0	333	March 3, 2024
Memory requierements Models	2	402	February 18, 2025