How to evaluate a tranied model?

Hello everone, I’m new to model evaluation. I trained a Qwen model on my own dataset. Now I need to evaluate my trained model using the loss function, but I don’t know how to do it. I see examples for other metrics like accuracy and precision, but how do I evaluate using the loss function? I have prepared a new dataset (500 entries) for it, but i dont know how I should carry on with trainer.evaluate()? Do I need to set max_step or which arguments are essential? These are my traning arguments:

training_args = DPOConfig(
output_dir=logging_dir,
logging_steps=10,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
loss_type=[“sft”],
loss_weights=[1.0],
max_prompt_length = 512,
max_completion_length = 512,
num_train_epochs=100,
max_steps=100000,
load_best_model_at_end=True,
metric_for_best_model=“eval_loss”,
save_strategy=“steps”,
save_steps=25000,
eval_strategy=“steps”,
eval_steps=100,

)

trainer = DPOTrainer(
model=model,
processing_class=tokenizer,
args=training_args,
train_dataset=dataset[‘train’],
eval_dataset=dataset[‘valid’],
)

trainer.train()

1 Like

Do I need to set max_step or which arguments are essential?

Maybe no.

how do I evaluate using the loss function?

Hmm, SFTTrainer aside, I can’t find much documentation about DPOTrainer
Something like this?

# pip install -U trl transformers datasets accelerate
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import DPOTrainer, DPOConfig
import inspect

MODEL_ID = "Qwen/Qwen2.5-0.5B-Instruct"
tok = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
policy = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto")

# 1) Load UltraFeedback-binarized preference split
ds = load_dataset("HuggingFaceH4/ultrafeedback_binarized", split="test_prefs").select(range(10))

# 2) Keep only preference keys; drop 'messages', scores, ids, etc.
keep = {"prompt", "chosen", "rejected"}
drop = [c for c in ds.column_names if c not in keep]
eval_ds = ds.remove_columns(drop)

# 3) Tiny dummy train set to satisfy older TRL constructors that prep both splits
dummy_train = eval_ds.select(range(1))

# 4) Config: no generation during eval; loss-only
args = DPOConfig(
    output_dir="dpo-eval-demo",
    do_train=False,
    do_eval=True,
    per_device_eval_batch_size=2,
    generate_during_eval=False,   # correct flag in DPOConfig
    max_prompt_length=512,
    max_completion_length=512,
    reference_free=True,          # set False + pass ref_model if you have one
    report_to="none",
)

trainer = DPOTrainer(
    model=policy,
    args=args,
    train_dataset=dummy_train,
    eval_dataset=eval_ds,
    processing_class=tok,
)

metrics = trainer.evaluate(metric_key_prefix="dpo")
print({k: metrics[k] for k in metrics if k.startswith("dpo_") or k.startswith("eval_")})
# Read: dpo_eval_loss, dpo_rewards/accuracies, dpo_rewards/margins, dpo_rewards/chosen, dpo_rewards/rejected
# {'dpo_loss': 5.722265720367432, 'dpo_runtime': 17.2569, 'dpo_samples_per_second': 0.579, 'dpo_steps_per_second': 0.29, 'eval_rewards/chosen': -0.003398055676370859, 'eval_rewards/rejected': -0.0041963583789765835, 'eval_rewards/accuracies': 0.5, 'eval_rewards/margins': 0.0007982999086380005, 'eval_logps/chosen': -346.3999938964844, 'eval_logps/rejected': -438.79998779296875, 'eval_logits/chosen': -2.246875047683716, 'eval_logits/rejected': -1.3703124523162842}

I tried it on my dataset and it seems to work. I have one question: I need to plot the loss values as evaluation runs, but right now I only get an aggregated value. What should I change to get a plot? Is it possible to save the intermediate values so I can plot them afterward?

1 Like

When it comes to step-by-step values, I think the standard approach is to log them during training, like below. While it’s possible to do it afterward, the code becomes significantly more complicated

args = DPOConfig(
    output_dir="dpo-eval-demo",
    do_train=True,                     # training must run to log stepwise eval
    do_eval=True,
    evaluation_strategy="steps",
    eval_steps=100,
    logging_strategy="steps",
    logging_steps=10,
    report_to="tensorboard",           # or "wandb"
    logging_dir="tb_logs",
    generate_during_eval=False,
    reference_free=True,
)
trainer = DPOTrainer(model=policy, args=args, train_dataset=your_train, eval_dataset=eval_ds, processing_class=tok)
trainer.train()