I am fine tuning a BERT model for a Multiclass Classification problem. While training my losses seem to look a bit “unhealthy” as my validation loss is always smaller (eval_steps=20) than my training loss. How can I plot a loss curve with a Trainer() model?
Scott from Weights & Biases here. Don’t want to be spammy so will delete this if it’s not helpful. You can plot losses to W&B by passing report_to to TrainingArguments.
Hey Scott,
I think its helpful but I already do that. Anyway I want to find a way to directly plot the Losses in my notebook… . Any idea how to achieve that? Cheers
Note that validation losses being smaller than train is not necessarily bad or weird when working with advanced architectures and techniques, since you are not really comparing equivalent things. For example, consider dropout, that “cancels” some connections at train, while using all during evaluation (validation).
I trained a few other Bert models and it seems that all models need a few steps (up to 50) till the train loss becomes lower compared to the validation loss. Even with different random states etc. Do you think I do not really have to worry? I mean after those “starting problems” the losses behave normal/healthy for my taste (0.3 vs 0.6 when finished with early stopping)
I obviously can’t say! But the fact that val loss is lower than train would not be a big concern to me! How those losses evolve seems more important. And of course if the model performance actually improves with time, that’s also more relevant! (You can see this in downstream tasks if training a language model).
Hey scottire, is it possible for me to obtain the training metrics and load them into a pandas dataframe? I’m looking to plot these scores in matplotlib so that I can compare with models trained with other frameworks.
Also, for using wandb is there a way for me to view the plot against epochs rather than steps?
import pandas as pd
import wandb
api = wandb.Api()
entity, project = "<entity>", "<project>" # set to your entity and project
runs = api.runs(entity + "/" + project)
summary_list, config_list, name_list = [], [], []
for run in runs:
# .summary contains the output keys/values for metrics like accuracy.
# We call ._json_dict to omit large files
summary_list.append(run.summary._json_dict)
# .config contains the hyperparameters.
# We remove special values that start with _.
config_list.append(
{k: v for k,v in run.config.items()
if not k.startswith('_')})
# .name is the human-readable name of the run.
name_list.append(run.name)
runs_df = pd.DataFrame({
"summary": summary_list,
"config": config_list,
"name": name_list
})