Hello!
I used a deberta model (microsoft-deberta-v3-base) finetuning it and saved a checkpoint from it for inference.
I would like to do hyperparameter optimization with Ray Tune. My strategy was to load the model checkpoint and try the hyperparameter_search
but when I try to use, it brings this error.
TuneError Traceback (most recent call last)
<ipython-input-35-8e89fe27cbea> in <module>
23 )
24
---> 25 best_run = trainer.hyperparameter_search(
26 direction="maximize",
27 n_trials=2
2 frames
/usr/local/lib/python3.8/dist-packages/transformers/trainer.py in hyperparameter_search(self, hp_space, compute_objective, n_trials, direction, backend, hp_name, **kwargs)
2414 HPSearchBackend.WANDB: run_hp_search_wandb,
2415 }
-> 2416 best_run = backend_dict[backend](self, n_trials, direction, **kwargs)
2417
2418 self.hp_search_backend = None
/usr/local/lib/python3.8/dist-packages/transformers/integrations.py in run_hp_search_ray(trainer, n_trials, direction, **kwargs)
336 dynamic_modules_import_trainable.__mixins__ = trainable.__mixins__
337
--> 338 analysis = ray.tune.run(
339 dynamic_modules_import_trainable,
340 config=trainer.hp_space(None),
/usr/local/lib/python3.8/dist-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, chdir_to_trial_dir, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, max_concurrent_trials, _experiment_checkpoint_dir, _remote, _remote_string_queue)
754 if incomplete_trials:
755 if raise_on_failed_trial and not state["signal"]:
--> 756 raise TuneError("Trials did not complete", incomplete_trials)
757 else:
758 logger.error("Trials did not complete: %s", incomplete_trials)
TuneError: ('Trials did not complete', [_objective_84707_00000, _objective_84707_00001])
I tried to build based on these two sources here (references) to use ray as a hyperparameter optimizer but I don’t know how to proceed and I’m having trouble. To use the ray optimizer do I need the config function?
I used the first example in hugging face doc as a base and it worked fine with the dataset glue but then i tried to replicate with the model i used and in another dataset I get this same error.
This is my code:
model_checkpoint = '/content/microsoft-deberta-v3-base_dataset_size-200_epochs-2_batch_size-32'
def model_init():
return AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=3)
args = TrainingArguments(
output_dir= "/content/ZeroTraining/",
num_train_epochs= config["num_epochs"],
per_device_train_batch_size= per_device_train_batch_size,
seed=42,
evaluation_strategy="steps",
eval_steps=100,
disable_tqdm=True
)
trainer = Trainer(
args= args,
tokenizer= tokenizer,
train_dataset= train_dataset,
eval_dataset= val_dataset,
model_init= model_init,
compute_metrics= compute_metrics,
)
best_run = trainer.hyperparameter_search(
direction="maximize",
n_trials=2
)
can anybody help me?
References
- Hyperparameter Search with Transformers and Ray Tune (huggingface.co)
- notebooks/text_classification.ipynb at main · huggingface/notebooks (github.com)
During the attempts, some doubts arose about the implementation, among them - Can I optimize the hyperparameters along with the training? Can I save the best parameters together with a checkpoint and load the best model? How to acces the model for loading with an id?