2 evalset but got no validation loss

Nevermetyou · April 1, 2024, 1:56am

Hi, I am trying to pass 2 evaluation set to Trainer but a weird thing happen to me.
I got no validaiton loss

here is my training args

training_args_dict = {
    "output_dir": save_dir,
    "overwrite_output_dir": True,
    "logging_strategy": "epoch",
    "evaluation_strategy": "epoch",
    "save_strategy": "epoch",
    "per_device_train_batch_size": batch_size,
    "per_device_eval_batch_size": batch_size,
    "save_total_limit": 1,
    "num_train_epochs": epochs,
    "predict_with_generate": True,
    "report_to": report_to,
    "run_name": run_name,
    "load_best_model_at_end": True,
    "seed": SEED,
    "generation_config": gen_config,
    "metric_for_best_model": "eval_validation_loss",
    # "disable_tqdm" : True,
}

here is my Trainer api

def get_trainer(
    input_col,
    label_col,
    model_name,
    path_to_hf_data,
    data_collator,
    training_args_dict,
    early_stop_round=20,
):
    tokenized_dataset = preprocess_text(path_to_hf_data, input_col, label_col)
    args = Seq2SeqTrainingArguments(**training_args_dict)
    optimizer = Adafactor(
        model.parameters(),
        scale_parameter=True,
        relative_step=True,
        warmup_init=True,
        lr=None,
    )
    trainer = Seq2SeqTrainer(
        model=model,
        args=args,
        train_dataset=tokenized_dataset["train"],
        eval_dataset={
            "validation" : tokenized_dataset["valid"],
            "training" : tokenized_dataset["train"],
        },
        data_collator=data_collator,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics,
        optimizers=(optimizer, AdafactorSchedule(optimizer)),
        callbacks=[EarlyStoppingCallback(early_stopping_patience=early_stop_round)],
    )
    return trainer

Could someone help me with this matter? The Documenation is not clear to me how to pass 2 evaluation set to trainer.

swtb · April 3, 2024, 1:09pm

Unless you have a specific reason, its best to evaluate without the training dataset.

Pass as eval_dataset=tokenized_dataset['valid']

Nevermetyou · April 15, 2024, 12:51pm

The reason for this is I want to see the metric for the training set, too. The other day, I just saw a comment saying this was not the intention of trainer API, so I gave up on this.

swtb · April 15, 2024, 2:35pm

I don’t think we can gain much additional context by evaluating on the training set. It’s kind of like doing an exam paper where you’ve already seen the questions.

Nevermetyou · April 17, 2024, 6:41am

Thanks for your explanation

Topic		Replies	Views
No log for validation loss in trainer.train() Beginners	4	6100	April 13, 2024
Use Trainer API with two valiation sets 🤗Transformers	1	1840	February 28, 2022
Is Eval and Validation same in Trainer API? Beginners	4	1736	September 14, 2021
Logs of training and validation loss Beginners	10	32605	February 14, 2025
Trainer not logging eval_loss Beginners	2	915	April 26, 2021

2 evalset but got no validation loss

Related topics