Load_best_model_at_end doesn't work?

I am using trainer to finetune llava model. I want to save the best model during training based on the specific metric, however I found it loads the model at the end of training instead of the best one during the training. Here’s my relevant hyperparameters:

trainer.train()
trainer.evaluate(eval_dataset)
deepspeed llava/train/train_xformers.py \
    --bf16 False  \
    --fp16 True   \
    --num_train_epochs 10   \
    --per_device_train_batch_size 16  \
    --per_device_eval_batch_size 4  \
    --gradient_accumulation_steps 1  \
    --evaluation_strategy "steps"  \
    --eval_steps 10  \
    --save_steps 10 \
    --save_strategy "steps"    \
    --greater_is_better True \
    --load_best_model_at_end True \
    --metric_for_best_model eval_roc_auc \
    --learning_rate 0.000005   \
    --weight_decay 0.0000   \
    --warmup_ratio 0.03  \
    --lr_scheduler_type "cosine"  \
    --logging_steps 1  \
    --tf32 False  \
    --model_max_length 2048  \
    --gradient_checkpointing True  \
    --dataloader_num_workers 4   \
    --save_total_limit 1 \
    --lazy_preprocess True  \
    --report_to wandb

Thanks for your help in advance.