Specify Loss for Trainer / TrainingArguments

I’d like to fine-tune for a regression task rather than a classification task. How do I change the default loss in either TrainingArguments or Trainer()?

2 Likes

You can overwrite the compute_loss method of the Trainer, like so:

from torch import nn
from transformers import Trainer

class RegressionTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')
        loss_fct = MSELoss()
        loss = loss_fct(logits.squeeze(), labels.squeeze())
        return (loss, outputs) if return_outputs else loss

However, several models in the library have an attribute of their config called problem_type, which you can set to “regression”. In that case, you shouldn’t overwrite anything, and you can just use the default loss of the model.

3 Likes

Thank you!

@nielsr , I tried this and the model’s predictions concentrate around a single predicted value. I’m almost exactly copying the fine-tuning tutorial. Any idea why the model seems to be failing to learn?

My code:

    training_args = TrainingArguments(
        output_dir=results_dir,  # output directory
        num_train_epochs=50,  # total number of training epochs
        per_device_train_batch_size=16,  # batch size per device during training
        per_device_eval_batch_size=64,  # batch size for evaluation
        warmup_steps=500,  # number of warmup steps for learning rate scheduler
        weight_decay=0.01,  # strength of weight decay
        logging_dir=results_dir,  # directory for storing logs
        logging_steps=10,
        report_to='wandb',
        do_eval=True,
        evaluation_strategy="steps",
        eval_steps=10,
    )

    class RegressionTrainer(Trainer):

        def compute_loss(self,
                         model,
                         inputs,
                         return_outputs=False):
            labels = inputs.get("labels")
            outputs = model(**inputs)
            logits = outputs.get('logits')
            loss = torch.mean(torch.square(logits.squeeze() - labels.squeeze()))
            return (loss, outputs) if return_outputs else loss

    pytorch_model_save_path = os.path.join(results_dir, 'pytorch_model.bin')
    if os.path.isfile(pytorch_model_save_path):  # If model was already fine-tuned
        # yes, pass the whole results dir; see https://github.com/huggingface/transformers/issues/1620
        model = DistilBertForSequenceClassification.from_pretrained(
            results_dir,
            num_labels=1)
    else:  # If model needs to be fine-tuned
        # Set output dimension to 1 to perform regression
        model = DistilBertForSequenceClassification.from_pretrained(
            "distilbert-base-uncased",
            num_labels=1)

    trainer = RegressionTrainer(
        model=model,  # the instantiated 🤗 Transformers model to be trained
        args=training_args,  # training arguments, defined above
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        # compute_metrics=compute_eval_metrics,
    )

    if force_train or not os.path.isfile(pytorch_model_save_path):
        trainer.train()
        trainer.save_model(output_dir=results_dir)

    all_prediction_output = trainer.predict(all_dataset)

    all_predictions = all_prediction_output.predictions
    all_predictions = all_predictions.squeeze()
    all_labels = all_prediction_output.label_ids

1 Like

This is weird because my validation loss is going down, but in a suspiciously smooth way:

Turns out there was no error! Two things:

  1. The learning rate was small and the validation loss was being evaluated very frequently, which explains why the validation loss was so smooth.

  2. I needed to run 50 training epochs to see a real difference. That seems odd, but so be it

1 Like