In trying to use the Trainer’s hyperparameter_search
to run trials of my experiments, I found that there are some arguments that seemingly mean the same thing and I am trying to understand which take precedence.or if any of them can be ignored. I think there should be some documentation on this issue as it’s easy to get confused.
The reason for this duplicity comes from the fact that there are the arguments to hyperparameter_search itself, the arguments to TrainingArguments, as well as the additional arguments that can be passed to the library doing the hyperparameter tuning (such as ray tune) and they are not necessarily mutually exclusive (at least in the way that I interpret them).
Here is a list of issues regarding parameter confusion that I am having:
-
Checkpointing arguments. TrainingArguments and RayTune define their own checkpointing parameters. TrainingArguments has “metric_for_best_model”,“save_steps”, “save_total_limit” and ray tune has “checkpoint_score_attr”, “checkpoint_freq”, “keep_checkpoints_num”
-
TrainingArgument’s load_best_model_at_end. This is more so a concern that I have in regards to issue #1. I am using WandB to log my results and for each trial, I want it to save the best performing model to its “artifacts” folder but can I have confidence that “load_best_model_at_end” will do anything if ray-tune handles checkpointing on its own side of things?
-
Metric Optimization Parameters. hyperparameter_search has “compute_objective” and “direction”. TrainingArguments has metric_for_best_model and greater_is_better. Then ray-tune has metric and mode. I’m assuming the arguments to hyperparameter_search simply pass those arguments down to the corresponding ones of ray-tune (or whatever tuning library is being used). I think that just to give the user confidence in what is going on maybe a warning saying that passing those additional ray tune parameters is redundant?
@sgugger maybe you can clear up some of this confusion? I know that there is also this post in regards to the hyperparameter_search function. But felt it better to make a new one as this is a bit of a loaded question.