I am also confused by this as well. I can see the output just like @ZongqianLi , but I do not see any output in my logs.
For instance, this what I have done:
# Define Trainer
## set up arguments
args = TrainingArguments(
output_dir=out_dir,
evaluation_strategy="steps",
eval_steps=500,
save_steps=1500,
report_to="none",
logging_dir="../../output/logs",
logging_strategy="steps",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
learning_rate=5E-05,
num_train_epochs=5,
seed=0,
load_best_model_at_end=True,
)
# establish HF training object
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=torch_learn.compute_metrics,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3), neptune_callback]
)
# train model
trainer.train()
Then I can see within my .ipynb
output a table that looks like this:
Step Training Loss Validation Loss Accuracy Precision Recall F1
500 0.690300 0.462797 0.841827 0.786610 0.841827 0.813205
... ... ... ... ... ... ...
4000 0.407400 0.524831 0.856672 0.833902 0.856672 0.838811
Next, I go to my logging_dir
and I do not see anything there:
$ tree ../output/logs/
I cannot show the output but trust me when I say that there is no log present from the model.
What do I need to do to find the logs and/or reload the trained model from a checkpoint in my local directory and get those metrics?
Other Possible Relevant Information:
$ conda list transformers
# Name Version Build Channel
sentence-transformers 2.2.2 pypi_0 pypi
transformers 4.23.1 pypi_0 pypi