Hello
I am fine-tuning a foundation model on Language Modeling (Next token prediction task).
As you can see with my code bellow, evaluation is performed every 5 steps :
train_args = transformers.TrainingArguments(
output_dir=output_dir,
warmup_steps=1,
per_device_train_batch_size=2,
gradient_accumulation_steps=1,
max_steps=max_steps,
learning_rate=2.5e-5,
evaluation_strategy = "steps",
eval_steps = 5,
)
trainer = transformers.Trainer(
model = model,
train_dataset = train,
eval_dataset = eval,
args = args,
data_collator = data_collator,
)
trainer.train()
The training time for each step is normal, but evaluation step is very long !
Would you have any idea why it is so long ?
Thank you