Weights and biases not showing train loss correctly

Hi all, I’m training a binary text classification model. In order to debug, I’m training and evaluating on a small subset of the data (around 16 data points) to see if the model can successfully overfit. However, the train_loss logged to Weights and Biases are not showing correctly – as you can see from the screenshot, it’s just a single point. Any idea on why this happened?

Below are my training code:

model = AutoModelForSequenceClassification.from_pretrained("roberta-large")

training_args = TrainingArguments(
    output_dir='./results',
    learning_rate=1e-3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=5,
    evaluation_strategy="epoch",
    logging_steps=1,
    # weight_decay=0.01,
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_ds,
    eval_dataset=encoded_ds,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    # data_collator=data_collator,
)

What happens if you drop your batch size (both train and eval) to 1? And set

max_steps = 16,
evaluation_strategy="steps"

Not sure if it will work fully, but might help diagnose what is going on

Hey! After doing some digging, I think you just need to scroll through the nine panels available (only 6 out of 9 displayed on your screenshot) and you’ll find the actual training loss under train/loss and not train/train_loss. W&B logs different things that the HuggingFace trainer sends it, and train/train_loss appears to be some single value that gets sent at the end of training, and that’s why it just looks like a dot. Let me know if it solves it or if you have other questions :slight_smile:

3 Likes