How to set up Trainer for a regression?

olaffson · December 21, 2021, 3:13am

Hello,

I am aware that I can run a regression model using float target values and num_labels=1 in a classification head like below:

model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", 
                                                           num_labels=1,
                                                           ignore_mismatched_sizes=True)

The problem is that right now I am merely adapting the Trainer specs for classification and during training I can see an accuracy metric where rmse or r-squared would be more appropriate.

See the accuracy score below on the validation data:

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

batch_size = 32

args = TrainingArguments(
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=3,
    report_to="none",
    weight_decay=0.01,
     output_dir='/content/drive/MyDrive/kaggle/',
    metric_for_best_model='accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test'],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

which gives

Epoch	Training Loss	Validation Loss	Accuracy
1	0.507300	0.499625	0.503853
2	0.466000	0.495724	0.503853

Which arguments in trainer should I use to I get rmse or r-squared instead? I assume the loss (that is minimized) is already the mean squared error (maybe I am wrong?)

Thanks!

nielsr · December 21, 2021, 1:44pm

You can use the RMSE metric, like so:

from sklearn.metrics import mean_squared_error

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    rmse = mean_squared_error(labels, predictions, squared=False)
    return {"rmse": rmse}

Sources:

scikit learn - Is there a library function for Root mean square error (RMSE) in python? - Stack Overflow

olaffson · December 21, 2021, 1:58pm

ahhh!! thanks @nielsr !! and am I correct to assume that the loss being minimized here (0.5 and 0.466 in the example) is the mean squared error as well?

nielsr · December 21, 2021, 1:59pm

Yes, you can see that in the source code here.

Note that you can also set the problem_type of the model to “regression” (which is equivalent to setting num_labels=1).

kechan · November 21, 2022, 4:53pm

Using HF Evaluate

from evaluate import load
metric = load('mse')

metric.compute(predictions=predictions, references=labels, squared=False)

subhamiiita · June 28, 2023, 11:37am

@nielsr, can you please help me with which language model I can use here? For example I am solving a rating prediction task for user-movie in the range of [1-5]. But when I am doing num_labels =1, it says IndexError: Target 4 is out of bounds. Can you pls help me?

gustavosr8 · April 13, 2024, 7:51pm

Hi @olaffson! I’m trying to use AST for sound regression (for college) and I’m struggling on how to train a model based on my own data. Do you have any reference on how to do it or can I find your code in any public repository? I appreciate any help

Topic		Replies	Views
How to use RMSE loss for regression Beginners	0	146	January 24, 2024
Specify Loss for Trainer / TrainingArguments 🤗Transformers	5	21489	October 5, 2021
Implementing a Trainer with custom loss produces key error 🤗Accelerate	2	3135	April 30, 2023
EvalPrediction returning one less prediction than label id for each batch Beginners	7	6112	June 19, 2024
I'm having trouble with my Hugginface Trainer Beginners	1	2010	January 24, 2023

How to set up Trainer for a regression?

Related topics