Logs of training and validation loss

Hi, I made this post to see if anyone knows how can I save in the logs the results of my training and validation loss.
image

I’m using this code:

*training_args = TrainingArguments(*
*    output_dir='./results',          # output directory*
*    num_train_epochs=3,              # total number of training epochs*
*    per_device_train_batch_size=16,  # batch size per device during training*
*    per_device_eval_batch_size=16,   # batch size for evaluation*
*    warmup_steps=50,                 # number of warmup steps for learning rate scheduler*
*    weight_decay=0.01,               # strength of weight decay*
*    logging_dir='./logs',            # directory for storing logs*
*    logging_steps=20,*
*    evaluation_strategy="steps"*
*)*

*trainer = Trainer(*
*    model=model,                         # the instantiated 🤗 Transformers model to be trained*
*    args=training_args,                  # training arguments, defined above*
*    train_dataset=train_dataset,         # training dataset*
*    eval_dataset=val_dataset             # evaluation dataset*
*)*

And I thought that using logging_dir and logging_steps would achieve that but in such logs all I see is this:

*output_dir ^A"^X*
*^Toverwrite_output_dir ^B"^L*
*^Hdo_train ^B"^K*
*^Gdo_eval ^A"^N*

*do_predict ^B"^\*
*^Xevaluate_during_training ^B"^W*
*^Sevaluation_strategy ^A"^X*
*^Tprediction_loss_only ^B"^_*
*^[per_device_train_batch_size ^C"^^*
*^Zper_device_eval_batch_size ^C"^\*
*^Xper_gpu_train_batch_size ^A"^[*
*^Wper_gpu_eval_batch_size ^A"^_*
*^[gradient_accumulation_steps ^C"^[*
*^Weval_accumulation_steps ^A"^Q*
*^Mlearning_rate ^C"^P*
*^Lweight_decay ^C"^N*

And it goes on like that.
Any ideas will be welcome. :slight_smile:

My system instalation:
- transformers version: 3.4.0
- Platform: Linux-3.10.0-1127.13.1.el7.x86_64-x86_64-with-centos-7.8.2003-Core
- Python version: 3.6.8
- PyTorch version (GPU?): 1.6.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

2 Likes

Hi!

I was also recently trying to save my loss values at each logging_steps into a .txt file.

There might be a parameter I am unaware of, but meanwhile I pulled from git the latest version of the transformer library and slightly modified the trainer.py to include in def log(self, logs: Dict[str, float]) -> None: the following lines to save my logs into a .txt file:

# TODO PRINT ADDED BY XXX
logSave = open('lossoutput.txt', 'a')
logSave.write(str(output) + '\n')
logSave.close()

Happy to hear if there is a less ‘cowboy’ way to do this, one that would not require modifying trainer.py :sweat_smile:

You could also subclass Trainer and override the log method to do this (which is less cowboy-y :wink: ). @lysandre is the logger master and might know a more clever way to directly redirect the logs from our logger.

3 Likes

The things I’m thinking off are way more cowboy than what you’re doing @aberquand! I think @sgugger’s solution is the cleanest.

You could redirect all the logs to a text file and then filter them out, but your approach here sounds better.

I know this is late as hell, but I will leave this here for future reference and if anyone comes across this post. I think that even less cowboy way would be to use callback:

class LogCallback(transformers.TrainerCallback):
    def on_evaluate(self, args, state, control, **kwargs):
        # calculate loss here

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    compute_metrics=compute_metrics,
    callbacks=[LogCallback],
)

Another even less cowboy way (without implementing anything) is that when you use those logging_steps args etc. You can access those logs after training is complete:

trainer.state.log_history

You should have metrics and losses from all steps over training. Hope this will help someone in future.

11 Likes