Running Optuna on Two HuggingFace Trainer Tasks

Satyen · October 5, 2022, 1:59pm

Hello all, I am trying to integrate Optuna to Perform Hyperparameter Sweep using the HuggingFace Trainer. For my purposes, I am using one of the pre-trained models from HuggingFace and performing two downstream tasks with it. The first task is finetuning on the Multi-NLI dataset. Once the training for that task is finished, I save my model locally and then use that fine-tuned model to perform my second downstream task which is training on the BOOLQ dataset.

The way that I want to use Optuna is that I want it to only start pruning bad trials once I reach my second downstream task. So even though I am passing in the hyperparameters for both tasks, I don’t want it to start pruning until I reach the second task. How would I go setting this pipeline up?

This is my training function:

```def train_model(
    train_dataset,
    val_dataset,
    num_train_epochs,
    train_batch_size,
    learning_rate,
    lr_scheduler_type,
    model_name_or_path,
    num_labels=2,
    hp_space=None,
    load_finetuned=False,
):

    if load_finetuned == False:

        model = AutoModelForSequenceClassification.from_pretrained(
            model_name_or_path, num_labels=num_labels
        )
        model.to(DEVICE)

        tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
        data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

        training_args = TrainingArguments(
            overwrite_output_dir=True,
            per_device_train_batch_size=train_batch_size,
            learning_rate=learning_rate,
            num_train_epochs=num_train_epochs,
            lr_scheduler_type=lr_scheduler_type,
            report_to="none",
            evaluation_strategy="epoch",
            save_strategy="no",
            save_total_limit=2,
            load_best_model_at_end=False,
        )
        trainer = Trainer(
            model,
            training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator,
            tokenizer=tokenizer,
        )
        # trainer.train()
        # trainer.save_model(f"best_model_downstream1")
        return trainer

    else:
        model = AutoModelForSequenceClassification.from_pretrained(
            model_name_or_path, num_labels=num_labels
        )
        model.to(DEVICE)

        tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
        data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
        training_args = TrainingArguments(
            output_dir="downstream2",
            overwrite_output_dir=True,
            per_device_train_batch_size=train_batch_size,
            learning_rate=learning_rate,
            num_train_epochs=num_train_epochs,
            lr_scheduler_type=lr_scheduler_type,
            evaluation_strategy="epoch",
            report_to="none",
            save_strategy="no",
            save_total_limit=2,
            load_best_model_at_end=False,
        )
        trainer = Trainer(
            model,
            training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator,
            tokenizer=tokenizer,
            compute_metrics=compute_metrics_for_eval_during_training,
        )
        trainer.train()
        trainer.save_model(f"best_model_downstream2")
        return trainer

And this is my evaluation function:

    metric = evaluate.load("f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

And this is my hyperparameter_space:

    epochs = trial.suggest_categorical("epochs", EPOCHS)
    batch_size = trial.suggest_categorical("batch_size", BATCH_SIZE)
    learning_rate = trial.suggest_categorical("learning_rate", LEARNING_RATES)
    scheduler = trial.suggest_categorical("scheduler", SCHEDULERS)
    model_name = trial.suggest_categorical("model_name", MODEL_NAMES)
    
    hp_space = {
        "model_name": model_name,
        "batch_size": batch_size,
        "learning_rate": learning_rate,
        "scheduler": scheduler,
        "epochs": epochs,
    }

    return  hp_space

Please suggest! Thank you

nbroad · October 5, 2022, 5:37pm

This thread should be useful. Using hyperparameter-search in Trainer

I’d break it into two steps. The first would be to train the model on the first dataset (with or without the hp search). The second would be do a hp search using the model from the last one. You can call .from_pretrained("trained_model_dir") to use the model trained in the first step.

Satyen · October 5, 2022, 6:28pm

Thank you for the reply! I was also thinking of doing something similar. So if I am understanding correctly, are you saying to break up both training steps as two separate hyperparameter sweeps?

Also another challenge I thought of is that one of the things I want to test out is which model would be best to use. I don’t know however, how to pass in different models into the trainer on each run.

nbroad · October 6, 2022, 6:57pm

You can create a loop inside or outside that can pass arguments (like model_name_or_path) to the script.

Satyen · October 7, 2022, 1:41am

But would that not run optuna on each model separately?

Like do you mean something like this:

for model in model_lst:
      # run  trainer on  hp_space

With that approach, however, the model itself would not be treated as a hyperparameter by Optuna, right? Is there any examples that are available for the approach that you have suggested? Thanks again for your help!

nbroad · October 7, 2022, 2:28pm

With that approach, however, the model itself would not be treated as a hyperparameter by Optuna, right?
Model types are not hyperparameters, so your statement is correct.

Is there any examples that are available for the approach that you have suggested?

Nope. You can implement it in a couple of lines of code, though.

Topic		Replies	Views
Passing in model to optuna Beginners	1	464	October 7, 2022
Optuna with huggingface Intermediate	1	2509	April 16, 2022
Using hyperparameter-search in Trainer 🤗Transformers	101	38136	July 2, 2024
How to get the best model from optuna hyper parameter search Beginners	0	1254	March 20, 2022
Looking for hyperparameter tuning advices Beginners	0	932	November 3, 2022

Running Optuna on Two HuggingFace Trainer Tasks

Related topics