Cant reproduce Optuna results

Hello all,

I run a Hyperparameter search using Optuna and got a model giving me 83% accuracy. When I then try and repeat this by retraining using the same hyperparameter (including seed), I cannot repeat the results. This is my trainer arguments and optuna search;

# Define the trainig arguments
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    seed = 0,
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=16,   # batch size for evaluation
    warmup_steps=22,                 # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    learning_rate=5e-5,              # initial learning rate for AdamW optimizer.           
    load_best_model_at_end=False,     # load the best model when finished training (default metric is loss)
    do_train=True,                   # Perform training
    do_eval=True,                    # Perform evaluation
    logging_dir='./logs',            # directory for storing logs    
    logging_steps=10,
    gradient_accumulation_steps=2,   # total number of steps before back propagation
    fp16=True,                       # Use mixed precision
    fp16_opt_level="02",             # mixed precision mode
    evaluation_strategy="epoch",     # evaluate each `logging_steps`
    save_strategy = 'no',           # The checkpoint save strategy to adopt during training. I dont want to save, probably why it did save and take up disk space in HP search
    #save_total_limit = 1.           # Trying this to stop octuna from saving            
)
trainer = Trainer(
    model_init=model_init,
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset,             # evaluation dataset
    compute_metrics=compute_metrics,
    #callbacks=[EarlyStoppingCallback(3, 0.0)] # early stopping if results dont improve after 3 epochs
)
best_run = trainer.hyperparameter_search(direction="maximize",
                                         hp_space=my_hp_space,
                                         compute_objective=my_objective, # cant get this working, for now work with loss
                                         n_trials=50,
                                         pruner=optuna.pruners.NopPruner(),
                                         sampler=optuna.samplers.GridSampler(search_space),
                                         study_name=name,
                                         storage="sqlite:////content/drive/MyDrive/{}.db".format(name), #change this to a local directory if you want to save to disk
                                         load_if_exists=True # you can change this to true, for continuing the search
                                         )

best_run

I have now also fixed the seed for numpy and torch

RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)

torch.manual_seed(RANDOM_SEED)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Could it be that the classification head that being reinitialised every time I retrain is random, resulting in different results?

1 Like

Same issue

I performed 100 runs and after few days, I ran best parameter on different device, and it’s giving totally different results…

I found the problem in my case. The GPU used to perform the experiment were different in Optuna and when I ran it on my own. Check which GPU was used during optimization to perform the experiment, maybe that’s the case for you as well.

Best of luck