For people interested in tools for logging and comparing different models and training runs in general, Weights & Biases is directly integrated with Transformers.
You just need to have wandb installed and logged in.
It automatically logs losses, metrics, learning rate, computer ressources, etc.
@boris I have a few questions for the HF Transformers integration:
It looks like wandb is charting the loss, learning rate, and epoch for a given run of Trainer.train(). Are there other things that would be useful to have charted for a finetuning run?
It also looks like wandb is using the logging_steps value in TrainerArguments. Is this right?
Is it preferred to set wandb behavior through the environment variables or in the finetuning script directly?
It also logs validation loss and all the task dependent metrics defined against your validation dataset.
Then you can decide to log losses/metrics against epoch using it as x-axis in W&B interface (instead of the step) if it makes more sense for your use case.
That is correct. It can also log evaluation loss/metrics at the end of training if the evaluate loop is called at the end of your script (usually the case when you use one of the “examples” script from the library).
There is no preferred way. When I write my own fine-tuning script, I like to pass explicitly my variables as the code looks more clear to me but either way should work perfectly fine.
@boris. Thanks for your reply! One more quick question. I see where wandb gets initialized in the HF integration. Suppose I want to also log some other metric or value that isn’t automatically logged in the integration. Is there a way to call the same wandb object from my script that the HF inmplementation is using to include this value being logged?
Yes, you can just call wandb.log anytime manually like so wandb.log({"my_metric":1.2}, step=trainer.global_step)
Passing the step is not mandatory but preferred when logging at different places.
I’m thinking that we could add a way to log & track datasets and trained models.
We could either do it with an environment variable or another parameter in TrainingArguments.
@boris How did you manage to create such a sweep? Have you used Google Colab or any local machine? (I’m currently struggling with setting up one in Colab)
In the docs it says that there are troubles by using wandb.agent with GPU-Support
@katharinafluch There are many ways to run sweeps but I actually ran mine in the console.
There is a new version of wandb coming up soon that will better support sweeps in Colab so I can make a demo for it then!
@boris I have export WANDB_MODE='dry_run' and WANDB_WATCH='all' setup in my environment. I ran a training session and synced the dryrun using wandb local. When I view the results on the localhost, I don’t see any information about the parameters. Just plots about learning rate, epoch, and loss.
Do you know where I can view this information about the parameters and/or if I have done something wrong to prevent them being recorded?
@boris, should the information about the parameters appear alongside the learning rate, epoch, and loss or are they somewhere else in the wandb dashboard?
By default, you get gradients logged under “gradients” (as long as you have more than 100 training steps).
You can also log parameters by setting WANDB_WATCH to 'all` and you will get both parameters and gradients (see documentation).
I made a demo colab which also logs both gradients and parameters.