Logging & Experiment tracking with W&B

lewtun · February 12, 2021, 8:07am

Hi @boris, thanks for sharing such a great tutorial!

One question I have is whether you know how one can suppress the run summary output that comes from executing wandb.finish() in a Jupyter notebook:

I find this output makes my notebooks very messy, especially when running a hyperparameter search with the HF Trainer.

I know the WandB docs suggest the following to suppress info messages

import logging
logger = logging.getLogger("wandb")
logger.setLevel(logging.ERROR)

but this doesn’t seem to remove the run summaries. Thanks!

boris · February 12, 2021, 3:56pm

Usually you could just use %%capture at the top of your cell but for some reason it does not work here.

Another solution is to just use:

from IPython.display import clear_output
clear_output()

lewtun · February 12, 2021, 4:00pm

Thanks for the tip @boris!

lewtun · February 13, 2021, 9:03am

FYI I found a more elegant way to suppress the output from the technical FAQ of the W&B docs: Technical FAQ - Documentation

Basically just run

%env WANDB_SILENT true

before loading W&B

boris · March 11, 2021, 5:10pm

At the moment, models are saved at the end of training.
Does anyone see a need for being able to save the models as they train?

tyoc213 · March 28, 2021, 3:08pm

only if CPU usage is keep low.

Also was thinking that it is a sync operation, thinking of slow connections that can take quite some time to upload 1Gb to cloud.

Also it will be nice if a progress bar is made. I had a cell with total running time of 6 hours, model trained in 2, the rest was of wandb finish, but at end stoped the cell and wand db detected “ctrl-c” and reported that wandb --data was interrupted, but I think my data was already on the cloud and finish just got stuck somewhere.

boris · March 29, 2021, 8:29pm

Upload is async so it should not interfere with your training.

simonschoe · April 7, 2021, 7:08am

Hey @boris, I am stuck with an issue and hope you might be able to resolve it. I am currently using transformers (in particular the Trainer API) to run sweeps using the CLI. The Trainer saves model checkpoints every save_steps under output_dir. Since my training procedure takes quite some time, I am wondering how would I be able to resume preempted sweeps in the context of the Trainer API, i.e. by loading previous checkpoints from output_dir?

Currently, I thought about something like this, however, I am not sure if it would work:

if wandb.run.resumed and any(Path(f'models\\checkpoints_{wandb.config.gradient_accumulation_steps}_{wandb.config.learning_rate}').iterdir()):
        trainer.train(resume_from_checkpoint=True)
    else:
        trainer.train()

boris · April 7, 2021, 3:41pm

Hi @simonschoe
Are your models auto-logged as artifacts (with WANDB_LOG_MODEL)?
If so, you don’t even have to manage your models locally as you can easily redownload the best model from your sweep later.

simonschoe · April 7, 2021, 4:01pm

No, not yet. I’d first like to make it work locally and eventually move over to storing checkpoints online along the way.

boris · April 7, 2021, 4:17pm

Ok, in case you want to do it, just use:
os.environ['WANDB_LOG_MODEL'] = 'true'

As for resuming from a checkpoint, I believe you need to use trainer.train(path) here.

simonschoe · April 8, 2021, 2:43pm

Thanks for your reply! Do you know if the setting os.environ['WANDB_LOG_MODEL'] = 'true' also ensures that model checkpoints are uploaded/logged along the way? For example, I’d like to store model checkpoints every x steps. However, from the artifacts documentation I infered that artifacts are generally stored after a succesful training run, i.e. at the end, is this correct?

boris · April 9, 2021, 4:04pm

At the moment it’s only at the end of training however I opened an issue to be able to log models continuously.

Neel-Gupta · July 8, 2021, 11:27pm

hey guys, It seems that eval loss is not being logged onto wandb (No loss being logged, when running MLM script (Colab) - #3 by Neel-Gupta)

And it looks like when the code cell for pre-training has run, it just keeps running even after the model has been trained.
That is, it apparently never executes next to wandb.finish() and keeps on going.

Any ideas on how I can solve these issues?

RylanSchaeffer · October 4, 2021, 5:55pm

@boris any updates on @Neel-Gupta 's point? The validation loss inside Trainer() isn’t logged to wandb.

boris · October 13, 2021, 6:12pm

Looks like his post was answered.
What matters is that wandb logs any metric that is produced by the Trainer, so you need to make sure you use the correct arguments for the Trainer (in particular requesting which type of evaluation strategy and the interval).

mgreenbe · November 16, 2021, 7:38pm

I have a trainer in which I’m overriding the evaluate method to inject some custom functionality. In particular, I’m computing some metrics “by hand” in the evaluate method, returning them in the appropriate Dict[str, float] object. However, these metrics aren’t being logged to W&B. Is this because they’re not being computed via the compute_metrics function typically passed to Trainer? I’m invoking W&B in the simplest way possible here, just passing report_to="wandb" in TrainingArguments.

sgugger · November 16, 2021, 7:42pm

You will need to manually log them to ahve them reorted to wandb (with the logmethod of the Trainer).

mgreenbe · November 16, 2021, 7:47pm

If metrics: Dict[str, float] is the object I’m returning from evaluate, is it as simple as just invoking self.log(metrics)?

Update: It seems it is that simple! (Correct me if I’m wrong, @sgugger!)

sgugger · November 16, 2021, 10:28pm

You just have to do this indeed

Topic		Replies	Views
W&B Support for the HF Flax Community Week Flax/JAX Projects	0	700	July 6, 2021
📣 Weights & Biases - Feedback 🤗Transformers	2	624	December 5, 2022
WandB does not log train loss Beginners	0	47	November 7, 2024
Wandb does not display train/eval loss except for last one Beginners	2	3521	March 4, 2022
Fine tuning Wav2vec for wolof Beginners	10	535	November 30, 2021

Logging & Experiment tracking with W&B

Related topics