I’d like to fine-tune for a regression task rather than a classification task. How do I change the default loss in either TrainingArguments
or Trainer
()?
You can overwrite the compute_loss
method of the Trainer, like so:
from torch import nn
from transformers import Trainer
class RegressionTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels")
outputs = model(**inputs)
logits = outputs.get('logits')
loss_fct = MSELoss()
loss = loss_fct(logits.squeeze(), labels.squeeze())
return (loss, outputs) if return_outputs else loss
However, several models in the library have an attribute of their config called problem_type
, which you can set to “regression”. In that case, you shouldn’t overwrite anything, and you can just use the default loss of the model.
Thank you!
@nielsr , I tried this and the model’s predictions concentrate around a single predicted value. I’m almost exactly copying the fine-tuning tutorial. Any idea why the model seems to be failing to learn?
My code:
training_args = TrainingArguments(
output_dir=results_dir, # output directory
num_train_epochs=50, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir=results_dir, # directory for storing logs
logging_steps=10,
report_to='wandb',
do_eval=True,
evaluation_strategy="steps",
eval_steps=10,
)
class RegressionTrainer(Trainer):
def compute_loss(self,
model,
inputs,
return_outputs=False):
labels = inputs.get("labels")
outputs = model(**inputs)
logits = outputs.get('logits')
loss = torch.mean(torch.square(logits.squeeze() - labels.squeeze()))
return (loss, outputs) if return_outputs else loss
pytorch_model_save_path = os.path.join(results_dir, 'pytorch_model.bin')
if os.path.isfile(pytorch_model_save_path): # If model was already fine-tuned
# yes, pass the whole results dir; see https://github.com/huggingface/transformers/issues/1620
model = DistilBertForSequenceClassification.from_pretrained(
results_dir,
num_labels=1)
else: # If model needs to be fine-tuned
# Set output dimension to 1 to perform regression
model = DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=1)
trainer = RegressionTrainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset,
eval_dataset=eval_dataset,
# compute_metrics=compute_eval_metrics,
)
if force_train or not os.path.isfile(pytorch_model_save_path):
trainer.train()
trainer.save_model(output_dir=results_dir)
all_prediction_output = trainer.predict(all_dataset)
all_predictions = all_prediction_output.predictions
all_predictions = all_predictions.squeeze()
all_labels = all_prediction_output.label_ids
This is weird because my validation loss is going down, but in a suspiciously smooth way:
Turns out there was no error! Two things:
-
The learning rate was small and the validation loss was being evaluated very frequently, which explains why the validation loss was so smooth.
-
I needed to run 50 training epochs to see a real difference. That seems odd, but so be it