Using hyperparameter-search in Trainer

brgsk · July 19, 2021, 9:37am

Hi, maybe try to store those parameters in a variable outside of get_model(), so it`s parameterless. Idk if this solves your problem, feel free to ask

Gianluca · July 19, 2021, 2:39pm

Well ray-tune sets those parameters (as they are hyperparameters I want to tune). I just don’t know how to get ray-tune to pass those parameters to the model_init function of the Trainer class. In the docs it says: “The function may have zero argument, or a single one containing the optuna/Ray Tune trial object, to be able to choose different architectures according to hyper parameters (such as layer count, sizes of inner layers, dropout probabilities etc).”

brgsk · July 19, 2021, 3:25pm

You can pass those parameters later on. IMO it’s easier to evoke get_model() without any parameters, and and then, having instantiated Trainer instance with get_model() passed as model_init argument, you can define parameters you want to tune and pass it as hp_space param of .parameter_search() method.

Below is a snippet that I used to tune hyperparams, hope you’ll find it useful.

config = RobertaConfig.from_pretrained(config_path)

def get_model():
    return RobertaForQuestionAnswering.from_pretrained(
        model_path, config=config)

training_args = TrainingArguments(...)

trainer = Trainer(
    model_init=get_model(),
    args=training_args,
    ...
)

# now this is where you can define your hyperparam space for Tune
tune_config = {
    "lr": tune.uniform(1e-5, 5e-5),
    "weight_decay": tune.choice([0.1, 0.2, 0.3])
}

# and/or, if using scheduler
scheduler = PopulationBasedTraining(
    metric="acc",
    mode="max",
    hyperparam_mutations={
        "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
        ...
    }
)

#finally
trainer.hyperparameter_search(
    hp_space=lambda _: tune_config,
    backend="ray",
    scheduler=scheduler
)

brgsk · July 19, 2021, 3:26pm

Basically just modify this example to meet your goal, that’s what I did.

Gianluca · July 19, 2021, 5:18pm

Well parameters such as the learning_rate or the weight_decay are okay because they do not modify the internal architecture of the transformer model. The problem I have is that I also want to tune hyperparameters corresponding to the internal architecture of my model (alpha and dropout in my case) and not the actual training. The config therefore needs to be modified before the model is instantiated.

sgugger · July 19, 2021, 5:57pm

The model_init function can either have no parameters, or take a trial, which contains the parameters you can set for the model (dropout for instance).

Gianluca · July 20, 2021, 6:32pm

Could I ask for an example? I want it to take the trial object from ray-tune but I don’t understand how it actually gets passed . Right now I am adding a parameter to the model_init function but whenever it executes, the parameter is None so the trial object is not actually being passed.

I am not sure if there is anything extra I need to do.

This is my code as of now:

from ray.tune.integration.wandb import WandbLoggerCallback

def tune_transformer(num_samples=8, gpus_per_trial=0, smoke_test=False):
    data_dir_name = "./data" if not smoke_test else "./test_data"
    data_dir = os.path.abspath(os.path.join(os.getcwd(), data_dir_name))
    if not os.path.exists(data_dir):
        os.mkdir(data_dir, 0o755)

    def get_model(params):
        db_config = db_config_base
        print(params)
        db_config.update({'alpha': params['alpha'], 'dropout': params['dropout']})
        return DistilBERTForMultipleSequenceClassification.from_pretrained(db_config, num_labels1 = 2, num_labels2 = 8)

    train_dataset = chunked_encoded_dataset['train']
    eval_dataset = chunked_encoded_dataset['validation']

    training_args = TrainingArguments(
        output_dir="DistilBertMultitask_HPsearch",
        learning_rate=1e-5,  # config
        do_train=True,
        do_eval=True,
        no_cuda=gpus_per_trial <= 0,
        evaluation_strategy="steps",
        save_total_limit = 5,
        logging_strategy="steps",
        logging_steps=5,
        eval_steps=5,
        load_best_model_at_end=True,
        metric_for_best_model='eval_s_f1',
        greater_is_better=True,
        num_train_epochs=1,  # config
        per_device_train_batch_size=16,  # config
        per_device_eval_batch_size=16,  # config
        warmup_steps=0,
        weight_decay=0.1,  # config
        logging_dir="./logs",
        skip_memory_metrics=True)

    tune_config_ASHA = {
        "dropout": tune.uniform(0.1, 0.4),
        "alpha": tune.uniform(0.4,0.8),
        "lr": tune.loguniform(1e-5, 1e-4),
        "batch_size": tune.choice([16])
    }

    trainer = Trainer(
        model_init=get_model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics)

    scheduler = ASHAScheduler(
        metric="eval_s_f1",
        mode="max",
        max_t=1,
        grace_period=1,
        reduction_factor=2)

    reporter = CLIReporter(
        parameter_columns={
            "weight_decay": "w_decay",
            "learning_rate": "lr",
            "per_device_train_batch_size": "train_bs/gpu",
            "num_train_epochs": "num_epochs"
        },
        metric_columns=[
            "eval_s_accuracy", "eval_loss", "eval_s_f1", "steps"
        ])

    trainer.hyperparameter_search(
        hp_space=lambda _: tune_config_ASHA,
        backend="ray",
        compute_objective=my_objective,
        direction="maximize",
        n_trials=num_samples,
        resources_per_trial={
            "cpu": 2,
            "gpu": gpus_per_trial
        },
        scheduler=scheduler,
        keep_checkpoints_num=1,
        checkpoint_score_attr="training_iteration",
        stop={"training_iteration": 1} if smoke_test else None,
        progress_reporter=reporter,
        local_dir="~/ray_results/",
        name="tune_transformer_asha",
        loggers=DEFAULT_LOGGERS + (WandbLogger,),
        time_budget_s=60*60*10) # 10 hours

Then when I try to run it:

tune_transformer(num_samples=1, gpus_per_trial=0, smoke_test=True)

I get this error:

<ipython-input-21-564565ca8893> in get_model(params)
     10         db_config = db_config_base
     11         print(params)
---> 12         db_config.update({'alpha': params['alpha'], 'dropout': params['dropout']})
     13         return DistilBERTForMultipleSequenceClassification.from_pretrained(db_config, num_labels1 = 2, num_labels2 = 8)
     14 

TypeError: 'NoneType' object is not subscriptable

sgugger · July 20, 2021, 8:22pm

Are you sure you are on the latest version of Transformers?

Gianluca · July 20, 2021, 10:01pm

Yes, transformers.__version__ says “4.8.2”.
And raytune is the latest version, “1.4.1”

sgugger · July 21, 2021, 12:58pm

Ah yes, the very first time the model is initialized (in the init of the Trainer) you will get a None for that trial (since there is no trial yet).

So you should have a backup for that in your get_model function:

    def get_model(params):
        db_config = db_config_base
        print(params)
        if params is not None:
            db_config.update({'alpha': params['alpha'], 'dropout': params['dropout']})
        return DistilBERTForMultipleSequenceClassification.from_pretrained(db_config, num_labels1 = 2, num_labels2 = 8)

You should then see printed one None, and then the value for each successive trial.

Gianluca · July 21, 2021, 4:35pm

Thank you. So this seems to solve that issue but now I seem to run into a new issue (they just keep on coming don’t they ).
The new issue is that the trainer seems to expect all hyperparameters (that I am tuning) to be direct arguments to the TrainingArguments class.
The trainer API is raising this exception to be specific:

AttributeError: Trying to set dropout in the hyperparameter search but there is no corresponding field in TrainingArguments.

Clearly ‘dropout’ and my parameter ‘alpha’ aren’t meant to be arguments to TrainingArguments but rather to be passed to model_init function only.
Is this a bug?

sgugger · July 21, 2021, 4:40pm

Indeed, this is a bug. Pushing a fix for this!

Edit: The fix has been merged, could you retry with a fresh install from master?

Gianluca · July 21, 2021, 5:21pm

Thank you! Seems to be running okay now

Steve · July 23, 2021, 10:38am

Hi all!

I’m using Optuna for hyper-parameter search, but I have a doubt/problem.

Here the performance report obtained during fine-tuning a pretrained BERT for Polarity Classification:

Epoch	Training Loss	Validation Loss	Accuracy	F1	Precision	Recall
1	0.546600	0.438429	0.795547	0.749214	0.745114	0.753995
2	0.277000	0.457287	0.786100	0.742984	0.735595	0.753490
3	0.207700	0.595937	0.800270	0.743335	0.751293	0.736927

Optuna says:

[I 2021-07-23 10:27:45,517] Trial 0 finished with value: 0.7433345815390376 and parameters: {‘learning_rate’: 4.5470513013108546e-05, ‘warmup_steps’: 0.6, ‘weight_decay’: 0.07700782690380507}. Best is trial 0 with value: 0.7433345815390376.

So it seems that as best value for the run it takes the score of the last row, even tough I specified load_best_model=True and metric_for_best_model="eval_f1"

Here my code snippet:

from sklearn.metrics import precision_recall_fscore_support, accuracy_score

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='macro')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

class MemorySaverCallback(TrainerCallback):
    "A callback that deleted the folder in which checkpoints are saved, to save memory"
    def __init__(self, run_name):
        super(MemorySaverCallback, self).__init__()
        self.run_name = run_name

    def on_train_begin(self, args, state, control, **kwargs):
        print("Removing dirs...")
        if os.path.isdir(f'./{self.run_name}'):
            import shutil
            shutil.rmtree(f'./{self.run_name}')
        else:
            print("\n\nDirectory does not exists")

training_args = TrainingArguments(
    RUN_NAME, 
    num_train_epochs=15,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    evaluation_strategy="epoch",
    logging_strategy="steps",
    logging_steps=1,
    logging_first_step=False,
    overwrite_output_dir=True,
    save_strategy="no",
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model="eval_f1",
)

trainer = Trainer(
    model_init=partial(MyNet,2),
    args=training_args, 
    train_dataset=training_opos.select(range(2000)), 
    eval_dataset=validating_opos,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2), MemorySaverCallback(RUN_NAME)]
)

def my_hp_space_optuna(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 2e-6, 2e-4, log=True),
        "warmup_steps":  trial.suggest_float("warmup_steps", 0., 0.9, step=0.3),
        "weight_decay":  trial.suggest_float("weight_decay", 1e-6, 1e-1)
    }
def my_objective(metrics):
    return metrics["eval_f1"]

sa = trainer.hyperparameter_search(
    direction="maximize", 
    n_trials=1,
    hp_space=my_hp_space_optuna, 
    compute_objective=my_objective
)

Optuna version=2.8.0
Transformers version=4.6.1

Thanks in advance!

phosseini · July 30, 2021, 9:08pm

By trial here you mean an object of type transformers.trainer_utils.BestRun?

sgugger · August 1, 2021, 6:58am

No it’s a trial as provided by the HP-search framework you are using.

lthistlethwaite · August 6, 2021, 7:09pm

I have also implemented my model_init as you suggested with get_model(params) above, but I’m getting error messages related to the config update. Here is my code below, and the resulting error message below that.

training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    eval_steps=500,
    gradient_accumulation_steps=1000,
    eval_accumulation_steps=1
)

db_config_base = AutoConfig.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

def model_init(params):
        db_config = db_config_base
        if params is not None:
            db_config.update({'dropout': params['dropout']})
        return AutoModelForSequenceClassification.from_pretrained(db_config, return_dict=True)

def hp_space_ray(trial):
    return {
        "learning_rate": tune.loguniform(1e-6, 1e-4),
        "per_device_train_batch_size": tune.choice([8, 16, 24, 32]),
        "dropout" : tune.uniform(0,1)
    }

trainer = Trainer(
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=cc_train_dataset,
    eval_dataset=cc_val_dataset,
    model_init=model_init,
    compute_metrics=compute_metrics
)

best_trial = trainer.hyperparameter_search(
    hp_space=hp_space_ray,
    direction="maximize", 
    backend="ray",
    n_trials=1,                                       
    resources_per_trial={"gpu": 1}

And the error message:

HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/DistilBertConfig%20%7B%0A%20%20%22activation%22:%20%22gelu%22,%0A%20%20%22architectures%22:%20%5B%0A%20%20%20%20%22DistilBertForMaskedLM%22%0A%20%20%5D,%0A%20%20%22attention_dropout%22:%200.1,%0A%20%20%22dim%22:%20768,%0A%20%20%22dropout%22:%200.1,%0A%20%20%22hidden_dim%22:%203072,%0A%20%20%22initializer_range%22:%200.02,%0A%20%20%22max_position_embeddings%22:%20512,%0A%20%20%22model_type%22:%20%22distilbert%22,%0A%20%20%22n_heads%22:%2012,%0A%20%20%22n_layers%22:%206,%0A%20%20%22pad_token_id%22:%200,%0A%20%20%22qa_dropout%22:%200.1,%0A%20%20%22seq_classif_dropout%22:%200.2,%0A%20%20%22sinusoidal_pos_embds%22:%20false,%0A%20%20%22tie_weights_%22:%20true,%0A%20%20%22transformers_version%22:%20%224.9.1%22,%0A%20%20%22vocab_size%22:%2030522%0A%7D%0A/resolve/main/config.json

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-27-531b1829db6f> in <module>
----> 1 trainer = Trainer(
      2     args=training_args,
      3     tokenizer=tokenizer,
      4     train_dataset=cc_train_dataset,
      5     eval_dataset=cc_val_dataset,

~/conda/dsEnv/lib/python3.8/site-packages/transformers/trainer.py in __init__(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers)
    301             if model_init is not None:
    302                 self.model_init = model_init
--> 303                 model = self.call_model_init()
    304             else:
    305                 raise RuntimeError("`Trainer` requires either a `model` or `model_init` argument")

~/conda/dsEnv/lib/python3.8/site-packages/transformers/trainer.py in call_model_init(self, trial)
    906             model = self.model_init()
    907         elif model_init_argcount == 1:
--> 908             model = self.model_init(trial)
    909         else:
    910             raise RuntimeError("model_init should have 0 or 1 argument.")

<ipython-input-26-416a63683956> in model_init(params)
     14         if params is not None:
     15             db_config.update({'dropout': params['dropout']})
---> 16         return AutoModelForSequenceClassification.from_pretrained(db_config, return_dict=True)
     17 
     18 def hp_space_ray(trial):

~/conda/dsEnv/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    376         kwargs["_from_auto"] = True
    377         if not isinstance(config, PretrainedConfig):
--> 378             config, kwargs = AutoConfig.from_pretrained(
    379                 pretrained_model_name_or_path, return_unused_kwargs=True, **kwargs
    380             )

~/conda/dsEnv/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    448         """
    449         kwargs["_from_auto"] = True
--> 450         config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    451         if "model_type" in config_dict:
    452             config_class = CONFIG_MAPPING[config_dict["model_type"]]

~/conda/dsEnv/lib/python3.8/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    530                 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"
    531             )
--> 532             raise EnvironmentError(msg)
    533 
    534         except json.JSONDecodeError:

OSError: Can't load config for 'DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.9.1",
  "vocab_size": 30522
}
'. Make sure that:

- 'DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.9.1",
  "vocab_size": 30522
}
' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'DistilBertConfig {
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.9.1",
  "vocab_size": 30522
}
' is the correct path to a directory containing a config.json file

Gianluca · August 6, 2021, 7:34pm

There was a mistake in my comment above. The way you have it now, the from_pretrained function thinks that the whole config you are passing is the directory path to where your model and config is stored. It can easily be fixed by doing something like this:

def model_init(params):
        db_config = db_config_base
        if params is not None:
            db_config.update({'dropout': params['dropout']})
        return AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=db_config._name_or_path, config = db_config, return_dict=True)

Maybe if you only pass the config using the config keyword argument, that would work as well? I haven’t tried that but this is how I got around this issue.

lthistlethwaite · August 6, 2021, 9:22pm

Thanks for your thoughts! I now changed the model_init() to the below, but now I’m getting that TrainingArguments error:

def model_init(params):
        db_config = db_config_base
        if params is not None:
            db_config.update({'dropout': params['dropout']})
        print("model_init() called. updated config is")
        print(db_config)
            return AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", config=db_config)

Not really an “error”, as the hyperparameter_search() still runs but a message that prints at the top of my ray tune output, which makes me think the dropout parameter (an architecture parameter) is not actually being tuned.

(pid=828) model_init() called. updated config is
(pid=828) DistilBertConfig {
(pid=828)   "_name_or_path": "distilbert-base-uncased-finetuned-sst-2-english",
(pid=828)   "activation": "gelu",
(pid=828)   "architectures": [
(pid=828)     "DistilBertForSequenceClassification"
(pid=828)   ],
(pid=828)   "attention_dropout": 0.1,
(pid=828)   "dim": 768,
(pid=828)   "dropout": 0.15601864044243652,
(pid=828)   "finetuning_task": "sst-2",
(pid=828)   "hidden_dim": 3072,
(pid=828)   "id2label": {
(pid=828)     "0": "NEGATIVE",
(pid=828)     "1": "POSITIVE"
(pid=828)   },
(pid=828)   "initializer_range": 0.02,
(pid=828)   "label2id": {
(pid=828)     "NEGATIVE": 0,
(pid=828)     "POSITIVE": 1
(pid=828)   },
(pid=828)   "max_position_embeddings": 512,
(pid=828)   "model_type": "distilbert",
(pid=828)   "n_heads": 12,
(pid=828)   "n_layers": 6,
(pid=828)   "output_past": true,
(pid=828)   "pad_token_id": 0,
(pid=828)   "qa_dropout": 0.1,
(pid=828)   "seq_classif_dropout": 0.2,
(pid=828)   "sinusoidal_pos_embds": false,
(pid=828)   "tie_weights_": true,
(pid=828)   "transformers_version": "4.9.1",
(pid=828)   "vocab_size": 30522
(pid=828) }
(pid=828) 
(pid=828) Trying to set dropout in the hyperparameter search but there is no corresponding field in `TrainingArguments`.

I will say, though, that I print the db_config in the model_init() as each trial is launched and the same process (pid=828) printed that dropout updated from its default 0.1 to the trial’s dropout value, which was 0.15601864044243652, so maybe this warning message is just not supposed to print?

Gianluca · August 6, 2021, 11:04pm

Try installing from master as suggested by sgugger:

!pip install git+https://github.com/huggingface/transformers.git

I assume that the fix hasn’t been part of the latest release yet.

Topic		Replies	Views
There is always something going wrong with hyper parameter tuning 🤗Transformers	4	1983	September 1, 2021
Hyperparameter search with wandb 🤗Transformers	1	233	July 28, 2024
Trainer.Hyperparameter_search() Trials did not complete. How to optimize parameters with ray tune? Beginners	0	941	January 10, 2023
Trainer.hyperparameter_search doesn't work for me Beginners	2	518	December 22, 2020
Hyper params search for model config 🤗Transformers	0	173	February 22, 2024

Using hyperparameter-search in Trainer

Related topics