Hi, I’m using this code to train (DAN
is a model class, DataCollatorWithPadding
is a Datacollector):
co = DataCollatorWithPadding()
training_args = TrainingArguments("DAN",
num_train_epochs= 10, #must be at least 10.
per_device_train_batch_size=32,
per_device_eval_batch_size=4,
learning_rate= 0.001,
save_total_limit=1,
log_level="error",
evaluation_strategy="epoch")
model = DAN()
trainer = Trainer(
model=model,
data_collator=co,
args=training_args,
callbacks = [],
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
preds = trainer.predict(small_eval_dataset)
print(compute_metrics(preds))
The best model gets saved to DAN/checkpoint-5500.
I tried adding the resume_from_checkpoint="DAN/checkpoint-5500"
flag to the trainer args, but it performed worse than it should (0.13 accuracy vs ~0.3)
Thanks