Running Optuna on Two HuggingFace Trainer Tasks

Hello all, I am trying to integrate Optuna to Perform Hyperparameter Sweep using the HuggingFace Trainer. For my purposes, I am using one of the pre-trained models from HuggingFace and performing two downstream tasks with it. The first task is finetuning on the Multi-NLI dataset. Once the training for that task is finished, I save my model locally and then use that fine-tuned model to perform my second downstream task which is training on the BOOLQ dataset.

The way that I want to use Optuna is that I want it to only start pruning bad trials once I reach my second downstream task. So even though I am passing in the hyperparameters for both tasks, I don’t want it to start pruning until I reach the second task. How would I go setting this pipeline up?

This is my training function:

```def train_model(
    train_dataset,
    val_dataset,
    num_train_epochs,
    train_batch_size,
    learning_rate,
    lr_scheduler_type,
    model_name_or_path,
    num_labels=2,
    hp_space=None,
    load_finetuned=False,
):

    if load_finetuned == False:

        model = AutoModelForSequenceClassification.from_pretrained(
            model_name_or_path, num_labels=num_labels
        )
        model.to(DEVICE)

        tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
        data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

        training_args = TrainingArguments(
            overwrite_output_dir=True,
            per_device_train_batch_size=train_batch_size,
            learning_rate=learning_rate,
            num_train_epochs=num_train_epochs,
            lr_scheduler_type=lr_scheduler_type,
            report_to="none",
            evaluation_strategy="epoch",
            save_strategy="no",
            save_total_limit=2,
            load_best_model_at_end=False,
        )
        trainer = Trainer(
            model,
            training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator,
            tokenizer=tokenizer,
        )
        # trainer.train()
        # trainer.save_model(f"best_model_downstream1")
        return trainer

    else:
        model = AutoModelForSequenceClassification.from_pretrained(
            model_name_or_path, num_labels=num_labels
        )
        model.to(DEVICE)

        tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
        data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
        training_args = TrainingArguments(
            output_dir="downstream2",
            overwrite_output_dir=True,
            per_device_train_batch_size=train_batch_size,
            learning_rate=learning_rate,
            num_train_epochs=num_train_epochs,
            lr_scheduler_type=lr_scheduler_type,
            evaluation_strategy="epoch",
            report_to="none",
            save_strategy="no",
            save_total_limit=2,
            load_best_model_at_end=False,
        )
        trainer = Trainer(
            model,
            training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator,
            tokenizer=tokenizer,
            compute_metrics=compute_metrics_for_eval_during_training,
        )
        trainer.train()
        trainer.save_model(f"best_model_downstream2")
        return trainer

And this is my evaluation function:

    metric = evaluate.load("f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

And this is my hyperparameter_space:

    epochs = trial.suggest_categorical("epochs", EPOCHS)
    batch_size = trial.suggest_categorical("batch_size", BATCH_SIZE)
    learning_rate = trial.suggest_categorical("learning_rate", LEARNING_RATES)
    scheduler = trial.suggest_categorical("scheduler", SCHEDULERS)
    model_name = trial.suggest_categorical("model_name", MODEL_NAMES)
    
    hp_space = {
        "model_name": model_name,
        "batch_size": batch_size,
        "learning_rate": learning_rate,
        "scheduler": scheduler,
        "epochs": epochs,
    }

    return  hp_space

Please suggest! Thank you

This thread should be useful. Using hyperparameter-search in Trainer

I’d break it into two steps. The first would be to train the model on the first dataset (with or without the hp search). The second would be do a hp search using the model from the last one. You can call .from_pretrained("trained_model_dir") to use the model trained in the first step.

Thank you for the reply! I was also thinking of doing something similar. So if I am understanding correctly, are you saying to break up both training steps as two separate hyperparameter sweeps?

Also another challenge I thought of is that one of the things I want to test out is which model would be best to use. I don’t know however, how to pass in different models into the trainer on each run.

You can create a loop inside or outside that can pass arguments (like model_name_or_path) to the script.

But would that not run optuna on each model separately?

Like do you mean something like this:

for model in model_lst:
      # run  trainer on  hp_space

With that approach, however, the model itself would not be treated as a hyperparameter by Optuna, right? Is there any examples that are available for the approach that you have suggested? Thanks again for your help!

With that approach, however, the model itself would not be treated as a hyperparameter by Optuna, right?
Model types are not hyperparameters, so your statement is correct.

Is there any examples that are available for the approach that you have suggested?

Nope. You can implement it in a couple of lines of code, though.