I am using trainer to finetune llava model. I want to save the best model during training based on the specific metric, however I found it loads the model at the end of training instead of the best one during the training. Here’s my relevant hyperparameters:
trainer.train()
trainer.evaluate(eval_dataset)
deepspeed llava/train/train_xformers.py \
--bf16 False \
--fp16 True \
--num_train_epochs 10 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "steps" \
--eval_steps 10 \
--save_steps 10 \
--save_strategy "steps" \
--greater_is_better True \
--load_best_model_at_end True \
--metric_for_best_model eval_roc_auc \
--learning_rate 0.000005 \
--weight_decay 0.0000 \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--save_total_limit 1 \
--lazy_preprocess True \
--report_to wandb
Thanks for your help in advance.