Implementation of Two Distinct Datasets with HuggingFace Trainer Module

Mdrnfox · June 18, 2025, 5:36pm

Only if you want to customize updates per gradient , use different optimizers ,conditionally skip batches.

You’d do something like this after you implement the extended class

trainer = MultiDatasetTrainer(
model=model,
args=training_args,
tokenizer=tokenizer,
dataset_a=dataset1,
dataset_b=dataset2,
bs_a=32,
bs_b=128,
train_dataset=None, # Must be None to use overridden method
)
trainer.train()

Topic		Replies	Views
Multi-Task dataset with Custom Sampler and Sharding Intermediate	4	1375	August 1, 2023
Update different parts of the model with different dataset Intermediate	0	36	August 13, 2024
Training using multiple GPUs Beginners	20	20107	February 25, 2024
Training on multiple datasets Beginners	0	482	January 23, 2024
How to run validation on multiple evaluation datasets simultaneously during Qwen2.5-VL-7B-Instruct fine-tuning? 🤗Transformers	2	23	August 10, 2025

Implementation of Two Distinct Datasets with HuggingFace Trainer Module

Related topics