Using hyperparameter-search in Trainer

prajjwal1 · August 20, 2020, 12:58pm

This branch hasn’t been merged, but I want to use optuna in my workflow. Although I have tried it, I want to confirm the usage. @sgugger (firstly thanks for the PR) could you please provide instructions on what changes do I need to make to make it work (like defining the search space and then getting results on them, and finding the best hyperparams). I want to confirm if I’m using it in the right manner. Also is the implementation complete ?

sgugger · August 20, 2020, 1:01pm

Hi there!

This is a work in progress so I’d hold on a tiny bit before starting using it (I’ll actually make some changes today). I’ll add an example in the PR once I’m done (hopefully by end of day) so you (and others) can start playing with it and give us potential feedback, but be prepared for some slight changes in the API as we polish it (we want to support other hp-search platforms such as Ray)

prajjwal1 · August 20, 2020, 3:54pm

Thanks for the reply. I’ll look forward to the example and using it. I’ll hopefully try to contribute if I come across some rough edges. Trainer changes a lot, my inherited trainer code breaks most of the time after each update, so I’m prepared for it .

sgugger · August 20, 2020, 8:42pm

Ok, done for today and prepared the road to support ray as well (not working right now though). There is an example on a regression problem in the README cause I didn’t want to launch my GPU setup. Will add a real example soon, but it should be enough to get you going.

prajjwal1 · August 21, 2020, 5:12am

Could you please tell where that README is ? I checked your recent commits on both trainer_optuna branch and master, didn’t see it.

sgugger · August 21, 2020, 1:11pm

Sorry, not README, I meant the PR first post.

sgugger · August 21, 2020, 5:32pm

I put a real example now.

BramVanroy · August 21, 2020, 5:37pm

What are the pros/cons of optuna VS ray?

sgugger · August 21, 2020, 5:40pm

Both work with the API. I haven’t used either long enough to have a strong opinion, but basically ray would be better if you have multiple GPUs and optuna might be better with just one, from what I understood.

sgugger · August 24, 2020, 4:02pm

FYI, this has been merged in master. Here is an example of use:

from nlp import load_dataset, load_metric
from transformers import AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding, Trainer, TrainingArguments

tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
dataset = load_dataset('glue', 'mrpc')
metric = load_metric('glue', 'mrpc')

def encode(examples):
    outputs = tokenizer(examples['sentence1'], examples['sentence2'], truncation=True)
    return outputs

encoded_dataset = dataset.map(encode, batched=True)
# Won't be necessary when this PR is merged with master since the Trainer will do it automatically
encoded_dataset.set_format(columns=['attention_mask', 'input_ids', 'token_type_ids', 'label'])

def model_init():
    return AutoModelForSequenceClassification.from_pretrained('bert-base-cased', return_dict=True)

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# Evaluate during training and a bit more often than the default to be able to prune bad trials early.
# Disabling tqdm is a matter of preference.
training_args = TrainingArguments("test", evaluate_during_training=True, eval_steps=500, disable_tqdm=True)
trainer = Trainer(
    args=training_args,
    data_collator=DataCollatorWithPadding(tokenizer),
    train_dataset=encoded_dataset["train"], 
    eval_dataset=encoded_dataset["validation"], 
    model_init=model_init,
    compute_metrics=compute_metrics,
)

# Defaut objective is the sum of all metrics when metrics are provided, so we have to maximize it.
trainer.hyperparameter_search(direction="maximize")

This will use optuna or Ray Tune, depending on which you have installed. If you have both, it will use optuna by default, but you can pass backend="ray" to use Ray Tune. Note that you need an installation from source of nlp to make the example work.

To customize the hyperparameter search space, you can pass a function hp_space to this call. Here is an example if you want to search higher learning rates than the default with optuna:

def my_hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 1, 5),
        "seed": trial.suggest_int("seed", 1, 40),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [4, 8, 16, 32, 64]),
    }

trainer.hyperparameter_search(direction="maximize", hp_space=my_hp_space)

and ray:

def my_hp_space_ray(trial):
    from ray import tune

    return {
        "learning_rate": tune.loguniform(1e-4, 1e-2),
        "num_train_epochs": tune.choice(range(1, 6)),
        "seed": tune.choice(range(1, 41)),
        "per_device_train_batch_size": tune.choice([4, 8, 16, 32, 64]),
    }

trainer.hyperparameter_search(direction="maximize", hp_space=my_hp_space)

If you want to customize the objective to minimize/maximize, pass along a function to compute_objective:

def my_objective(metrics):
    # Your elaborate computation here
    return result_to_optimize

trainer.hyperparameter_search(direction="maximize", compute_objective=my_objective)

prajjwal1 · August 24, 2020, 4:23pm

Thanks. I was following this PR. I wanted to know which type of hyperparams can be tuned with this approach? Does it work with Default ones only (training_args) ? What if we have custom param that we want to tune (for instance a lambda in an objective function) ?

sgugger · August 24, 2020, 5:18pm

The hyperparams you can tune must be in the TrainingArguments you passed to your Trainer. If you have custom ones that are not in TrainingArguments, just subclass TrainingArguments and add them in your subclass.

The hp_space function indicates the hyperparameter search space (see the code of the default for optuna or Ray in training_utils.py and adapt it to your needs) and the compute_objective function should return the objective to minize/maximize.

aswincandra · August 31, 2020, 8:48am

Thank you so much! But I have a problem when defining the Trainer. It said, “init() got an unexpected keyword argument ‘model_init’”. Is the Trainer doesn’t recognize the ‘model_init’ argument?

I think this error affect next error when I want to call the ‘hyperparameter_search’ method. It said, “‘Trainer’ object has no attribute ‘hyperparameter_search’”.

What should I do? Very sorry for the very newbie question and Thankyou before.

sgugger · August 31, 2020, 11:38am

This is new so you need an installation from source to use it. It will be in the next release coming soon otherwise.

aswincandra · September 3, 2020, 1:16am

Alright, I’m waiting for it!

prajjwal1 · September 3, 2020, 6:26am

FYI, You can pip install now to use this feature. No need to build from source.

aswincandra · September 10, 2020, 3:35am

Oh yeah thank you, It seems developed. But I’m still getting problem in hyperparameter_search method. I defined my backend parameter to ‘optuna’ but the error said: You picked the optuna backend, but it is not installed. Use pip install optuna., though I’ve already pip-installed it before the hyperparameter_search code line. The case was same when I defined the backend parameter into ‘ray’. Have I make a mistake? I run my code in Google Colab by the way.

BramVanroy · September 10, 2020, 11:33am

It means that it is not installed in your current environment. If you are using notebooks, you have to restart the kernel. Python needs to reload the libraries to see which ones are available.

aswincandra · September 14, 2020, 4:34am

Oh yeah it has already worked. Thank you so much!

chinhon · September 14, 2020, 6:06am

I wonder if Sylvain or others might have advice on how to make the hyperparameters search more efficient or manageable, time and resource-wise.

I’ve tried slimming down the dataset (500K rows to 90K rows), reducing the number of parameters to tune (to just 1, number of epochs) and changing the “direction” to “minimize” instead of “maximize”.

Is there something else I can do, aside from further cutting down the size of the dataset? I’m running trials on Colab Pro with GPU/high-RAM enabled, and current version looks like it’ll take about 7 hours (perfectly fine for others I’m sure).

I don’t suppose there’s an equivalent of RandomizedSearchCV for trainer?

Topic		Replies	Views
Trainer.hyperparameter_search doesn't work for me Beginners	2	518	December 22, 2020
Looking for hyperparameter tuning advices Beginners	0	932	November 3, 2022
Using grid search in `trainer.hyperparameter_search` Beginners	0	987	May 13, 2021
Trainer.predict() does not return values in optuna search 🤗Transformers	0	578	January 23, 2022
Parallel HPO when using `trainer.hyperparameter_search()` 🤗Transformers	0	344	December 30, 2021

Using hyperparameter-search in Trainer

Related topics