Hi everyone,
I’m trying to create multilingual batches in a controlled way. Specifically, I want each batch (e.g., with batch_size=32) to contain items from 4 different languages, with 8 examples sampled randomly per language.
I’ve already created a custom list of batches meeting this requirement, but I’m struggling with how to pass these batches to the Trainer. Currently, my Trainer setup looks like this: trainer = SFTTrainer(
model=model,
train_dataset=datasets[‘train’],
eval_dataset=datasets[‘eval’], # Add evaluation dataset
peft_config=peft_config,
tokenizer=tokenizer,
max_seq_length=512,
args=training_arguments,
formatting_func=formatting_prompts_func
)
Does anyone know how to properly integrate my custom batches with the Trainer? Any guidance would be greatly appreciated!
Thank you in advance!