Plotting separate loss curves for different datasets

jonathanasdf · April 28, 2023, 6:25am

I have multiple datasets mixed together using datasets.interleave_datasets().

In addition to the overall training loss, I would like to plot one loss curve per dataset in wandb so I can get a sense of how it is overfitting to each dataset. In this setup it’s totally possible that a particular dataset might not have produced any examples in between two reporting points, and that is totally fine.

I tried to map() my datasets before interleaving them to add a ‘source’ identifier to them, but in Trainer.compute_loss() the inputs dict doesn’t contain this ‘source’ key. Reading the documentation for Trainer, it says

If it is a [`~datasets.Dataset`], columns not accepted by the `model.forward()` method are automatically removed.

Are there standard ways of achieving what I want, or is there a way to keep the key around?

Topic		Replies	Views
Plot Loss Curve with Trainer() Beginners	9	17893	November 24, 2021
Track more than one loss using Trainer and Wandb Intermediate	1	648	July 11, 2024
Alternating between batches of different datasets Intermediate	0	221	February 8, 2024
Training on multiple datasets Beginners	0	476	January 23, 2024
Issue with WandB Initialization and Data Points Persistence Beginners	0	183	September 30, 2023

Plotting separate loss curves for different datasets

Related topics