Using hyperparameter-search in Trainer

Hey @dunalduck0 one usually just tracks the loss or perplexity for GPT-like models. You can compute the losses by adapting the evaluation code in this example :slight_smile:

I have a question, if I want to test diffrent learning rate should I write : “learning_rate”: tune.loguniform(1e-4, 2e-5, 5e-5,1e-5, 1e-2), or tune.loguniform(1e-4, 1e-2) will used diffrent learning rate

Hello,

I am using this code to find the best parameters for my model.

from ray.tune.schedulers import PopulationBasedTraining
from ray.tune import uniform
from random import randint

scheduler = PopulationBasedTraining(
    mode = "max",
    metric='exact_match', # mean_accuracy
    perturbation_interval=2,
    hyperparam_mutations={
        "weight_decay": lambda: uniform(0.0, 0.3),
        "learning_rate": lambda: uniform(1e-5, 5e-5),
        "per_gpu_train_batch_size": [3, 4, 5],
        "num_train_epochs": [10,11,12],
        "warmup_steps":lambda: randint(0, 500)
    }
)

best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="ray",
    n_trials=4,
    keep_checkpoints_num=2,
    scheduler=scheduler
)

However, I am having this miskate. Do you have an advice?

/usr/local/lib/python3.7/dist-packages/pyarrow/io.pxi in pyarrow.lib.Buffer.__reduce_ex__()

AttributeError: module 'pickle' has no attribute 'PickleBuffer'

Some people recommend to use python 3.8 instead of python 3.7, however, this workaround did not help me to resolve the issue. I am working in Google Colab.

Thanks in advance.

2 Likes

I have a strange behaviour when I am using custom HP function.
The results are the same on all trails and epoches.

default example:


def compute_metrics(eval_preds):
  metric = load_metric("f1")
  logits, labels = eval_preds
  predictions = np.argmax(logits, axis=-1)
  #evaluate(labels, predictions)
  return metric.compute(predictions=predictions, references=labels,average='weighted')

args = TrainingArguments(
    MODEL_NAME,
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=TR_BATCH_SIZE,
    per_device_eval_batch_size=TEST_BATCH_SIZE,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model='f1',
    push_to_hub=False,
)
train_dataset = tokenized_train["train"].shard(index=1, num_shards=10) 
trainer = Trainer(
    model_init=model_init,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=tokenized_test['train'],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

the results are :
image

but when I am using custom :

def my_hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-4, 1e-2, log=True),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 1, 3),
        "seed": trial.suggest_int("seed", 1, 40),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [1, 2, 4,6, 8]),
    }
trainer.hyperparameter_search(direction="maximize", hp_space=my_hp_space)

image

This helped me on Google Colab:
!pip install pickle5
Then
import pickle5 as pickle
After the first run there will be the pickle warning to restart the notebook and the same error. After the second “Restart and run all” the ray tune hyperparameter search begins.

Hey @sgugger , do you know if it’s possible to use cross validation with optuna for the hyperparameter-search ?
I found this which resemble what I’m looking for. I was wondering if it is implemented inside the Trainer ?
https://optuna.readthedocs.io/en/stable/reference/generated/optuna.integration.OptunaSearchCV.html

Thanks !