How to use huggingface HF trainer train with custom collate function?

mariosasko · August 16, 2023, 1:29pm

You can avoid this error by passing remove_unused_columns=False to TrainingArguments, but a cleaner solution is to use map to tokenize the dataset before passing it to the Trainer (instead of tokenizing lazily).

After this change, you should get the “The model did not return a loss from the inputs …” error, which you can fix by returning the labels column in the collate function (equal to input_ids).

(DataCollatorForLanguageModelling handles this automatically, so it’s better to perform the tokenization in map, and then use this collator as a data_collator, as explained in our NLP course)

Topic		Replies	Views
Prakash Hinduja Geneva, Switzerland - How to fine-tune a model on custom dataset in HF? Beginners	2	46	June 6, 2025
How does one create a pytoch data loader using an interleaved hugging face dataset? Beginners	3	1593	August 18, 2023
How does one create a custom hugging face model with a already working tokenizer? 🤗Transformers	1	967	July 29, 2024
Defining a custom dataset for fine-tuning translation Beginners	4	5083	July 10, 2021
How does one create a pytorch data loader with a custom hugging face data set without having errors? Beginners	3	3850	August 14, 2023

How to use huggingface HF trainer train with custom collate function?

Related topics