Transformers and Hyperparameter search using Optuna

Thank you very much for your analytical answer!

For point 1, I have also noticed 2 things that are not straight-forward to me, but yet I haven’t put much effort into them. Just mentioning them in case you have seen something similar:

My code for hyper parameter search is the following:

  training_args = TrainingArguments(         
      output_dir=SAVE_DIR,      
      per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH,         
      per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH,
      evaluation_strategy = "epoch",
      logging_strategy="epoch")     
    
  trainer = Trainer(
    model=None,
    args=training_args,
    train_dataset=tokenized_dset['train'],
    eval_dataset=tokenized_dset['validation'],
    model_init=model_init
  )
    
  best_trial = trainer.hyperparameter_search(
  direction="minimize",
  backend="optuna",
  hp_space=optuna_hp_space,
  n_trials=NUM_TRIALS
  )
  1. Even though I have set the logging_strategy="epoch", I get logs every 500 steps, so in the directory I have checkpoint-500, checkpoint-1000 etc., which is confusing since every log contains several epochs.

  2. In the directory there is a json file that contains the whole history of epochs (It could be the same as trainer.state.log_history), however the metrics of the last epoch aren’t logged in this file, if I have noticed correctly.

For point 2, yes I run a study with 10 trials and I noticed that some where pruned, so I guess it is working. I will definitely consider the SQLite database since it seems very handy in keeping all the information about trials.

Thank you for the caveats too. I will keep them in mind and in case I find a way to have as an object the whole study, I will let you know!

Thanks again,
Petrina