Making sense of duplicate arguments in Huggingface's hyperparameter search work flow

In trying to use the Trainer’s hyperparameter_search to run trials of my experiments, I found that there are some arguments that seemingly mean the same thing and I am trying to understand which take precedence.or if any of them can be ignored. I think there should be some documentation on this issue as it’s easy to get confused.
The reason for this duplicity comes from the fact that there are the arguments to hyperparameter_search itself, the arguments to TrainingArguments, as well as the additional arguments that can be passed to the library doing the hyperparameter tuning (such as ray tune) and they are not necessarily mutually exclusive (at least in the way that I interpret them).

Here is a list of issues regarding parameter confusion that I am having:

  1. Checkpointing arguments. TrainingArguments and RayTune define their own checkpointing parameters. TrainingArguments has “metric_for_best_model”,“save_steps”, “save_total_limit” and ray tune has “checkpoint_score_attr”, “checkpoint_freq”, “keep_checkpoints_num”

  2. TrainingArgument’s load_best_model_at_end. This is more so a concern that I have in regards to issue #1. I am using WandB to log my results and for each trial, I want it to save the best performing model to its “artifacts” folder but can I have confidence that “load_best_model_at_end” will do anything if ray-tune handles checkpointing on its own side of things?

  3. Metric Optimization Parameters. hyperparameter_search has “compute_objective” and “direction”. TrainingArguments has metric_for_best_model and greater_is_better. Then ray-tune has metric and mode. I’m assuming the arguments to hyperparameter_search simply pass those arguments down to the corresponding ones of ray-tune (or whatever tuning library is being used). I think that just to give the user confidence in what is going on maybe a warning saying that passing those additional ray tune parameters is redundant?

@sgugger maybe you can clear up some of this confusion? I know that there is also this post in regards to the hyperparameter_search function. But felt it better to make a new one as this is a bit of a loaded question.

I am also concerned with the following arguments. Still, I can probably clear up some of the questions for you.

  1. From what I’ve inspected from the logs, load_best_model_at_end argument is ignored during the hyperparameter search.
  2. Since load_best_model_at_end is ignored, I believe all the corresponding args (e.g. metric_for_best_model) are ignored as well - at least reproducing the results while changing this parameter, I get the same values.

Hope @sgugger can clarify.

Checkpointing really becomes its own thing with HP search, and is completely untested. I would recommend not using anything fancy, and I would not expect load_best_model_at_end to work during HP searches, especially with trials being aborted by the backend if they are unconclusive before we reach the end. load_best_model_at_end is really meant to be used in a regular training. That’s the same for 3, those arguments decide which model is the best and also what to monitor for early stopping, but they are not used by the hyperparameter search.

2 Likes

Thank you!
I’ve already made an issue on Github, Best model not loaded in Trainer.hyperparameter_search · Issue #13902 · huggingface/transformers · GitHub

There it’s shown that load_best_model_at_end is indeed ignored and probably need to be fixed.

1 Like