How to save the best trial's model using `trainer.hyperparameter_search`

I’m using hyperparameter_search for hyperparameter tuning in the following way:

trainer = Trainer(
            model_init=model_init,
            args=training_args,
            train_dataset=train_set,
            eval_dataset=dev_set,
            tokenizer=tokenizer,
            compute_metrics=compute_metrics,
)

best_trial = trainer.hyperparameter_search(
            backend="ray",
            direction='maximize',
            n_trials=10,
       )

Everything’s working well and I can see the information for the best trial in the best_trial. However, my question is how can I save the actual best model from the best trial? I tried saving the model using the trainer’s save_model like trainer.save_model(path/to/a/folder), but I get the following error:

trainer.save_model(path/to/a/folder)
  File "/home/ubuntu/anaconda3/envs/ccr/lib/python3.6/site-packages/transformers/trainer.py", line 1885, in save_model
    self._save(output_dir)
  File "/home/ubuntu/anaconda3/envs/ccr/lib/python3.6/site-packages/transformers/trainer.py", line 1930, in _save
    state_dict = self.model.state_dict()
AttributeError: 'NoneType' object has no attribute 'state_dict'

It looks like the trainer does not have the actual best model found as a result of hyperparameter tuning (?). My goal is simple, I basically want to use the best model from hyperparameter tuning to evaluate it on my final test set. But I can’t find a way to save the best model from hyperparameter tuning. Also, someone may say I can get the info from the best trial and fine-tune the model again, but I don’t want to do that and I just simply want to get the model from the hyperparameter tuning. Is there any way to do that? Thanks.

2 Likes

@sgugger Any thoughts on this?

There is no automatic process right now. If you set save_strategy="epoch" and save_total_limit=1, you will have a save of the model for each trial and you should be able to access it at the end by looking at checkpoint-{trail_id}-xxx.

We’ll put having it being automatic on the roadmap so it becomes easier in a future version! Hoping to have some time to do this next week or the week after.

5 Likes

Hi, thanks for the great work on the hyperparameter tuning API! It’s quite nice.

Just wanted to see whether there has been any movement on this? Is there any easy way to access the best model, or if not, is there a way to reload it from its best weights?

It would be preferable just to have the best model object be returned directly; but if it needs to be reinitialized and loaded from a checkpoint, that would be fine as well – however, it’s not clear to me where that checkpoint would even be located.

I don’t see anything in the directory structure inside the TrainingArguments.output_dir that corresponds to the run_id of the BestRun object returned by my hyperparameter tuning run (I am using Ray backend, fwiw).

3 Likes