No log for validation loss in trainer.train()

Hello, today I train the model, but there is no log for validation loss in the results of trainer.train(). The codes are as followed:

training_args = TrainingArguments(
output_dir = “./qa_results”,
overwrite_output_dir = False,
num_train_epochs = 1,
per_device_train_batch_size = 4,
per_device_eval_batch_size = 2,
logging_steps = 1,
logging_dir=“./log”,
save_steps = 1,
prediction_loss_only = True,
evaluation_strategy = “steps”,
learning_rate = 3e-5,
weight_decay = 0,
)

trainer = QuestionAnsweringTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)

train_result = trainer.train()

And the outputs are:

Could anyone help me? This bug confuses me for a whole day :frowning:

I am also confused by this as well. I can see the output just like @ZongqianLi , but I do not see any output in my logs.

For instance, this what I have done:


    # Define Trainer
    ## set up arguments
    args = TrainingArguments(
        output_dir=out_dir,
        evaluation_strategy="steps",
        eval_steps=500,
        save_steps=1500,
        report_to="none",
        logging_dir="../../output/logs",
        logging_strategy="steps",
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        learning_rate=5E-05,
        num_train_epochs=5,
        seed=0,
        load_best_model_at_end=True,
    )

   # establish HF training object
    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=torch_learn.compute_metrics,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=3), neptune_callback]
    )

    # train model
    trainer.train()

Then I can see within my .ipynb output a table that looks like this:

Step Training Loss Validation Loss   Accuracy Precision	Recall	F1
500	    0.690300	0.462797	0.841827	0.786610	0.841827	0.813205
...     ...         ...         ...         ...         ...         ...         
4000	0.407400	0.524831	0.856672	0.833902	0.856672	0.838811

Next, I go to my logging_dir and I do not see anything there:


$ tree  ../output/logs/

I cannot show the output but trust me when I say that there is no log present from the model.

What do I need to do to find the logs and/or reload the trained model from a checkpoint in my local directory and get those metrics?


Other Possible Relevant Information:

$ conda list transformers

# Name                    Version                   Build  Channel
sentence-transformers     2.2.2                    pypi_0    pypi
transformers              4.23.1                   pypi_0    pypi