Thank you very much for your analytical answer!
For point 1, I have also noticed 2 things that are not straight-forward to me, but yet I haven’t put much effort into them. Just mentioning them in case you have seen something similar:
My code for hyper parameter search is the following:
training_args = TrainingArguments(
output_dir=SAVE_DIR,
per_device_train_batch_size=PER_DEVICE_TRAIN_BATCH,
per_device_eval_batch_size=PER_DEVICE_EVAL_BATCH,
evaluation_strategy = "epoch",
logging_strategy="epoch")
trainer = Trainer(
model=None,
args=training_args,
train_dataset=tokenized_dset['train'],
eval_dataset=tokenized_dset['validation'],
model_init=model_init
)
best_trial = trainer.hyperparameter_search(
direction="minimize",
backend="optuna",
hp_space=optuna_hp_space,
n_trials=NUM_TRIALS
)
-
Even though I have set the
logging_strategy="epoch", I get logs every 500 steps, so in the directory I have checkpoint-500, checkpoint-1000 etc., which is confusing since every log contains several epochs. -
In the directory there is a json file that contains the whole history of epochs (It could be the same as trainer.state.log_history), however the metrics of the last epoch aren’t logged in this file, if I have noticed correctly.
For point 2, yes I run a study with 10 trials and I noticed that some where pruned, so I guess it is working. I will definitely consider the SQLite database since it seems very handy in keeping all the information about trials.
Thank you for the caveats too. I will keep them in mind and in case I find a way to have as an object the whole study, I will let you know!
Thanks again,
Petrina