Reducing output size when performing hyperparameter search

navjordj · June 10, 2021, 2:33pm

Hello everyone!

I’ve been trying to perform a simple hyperparameter research on ‘distilroberta-base’. When using Kaggle notebooks there is a 20 GB limit on outputs. Even when i am doing only 5 trials the output directory fills up. I am used to using GridSearchCV for HP-search which only keeps track of the best parameters and retrains at the end without saving individual models during the search.

I have tried using ‘overwrite_output_dir=True’ in TrainingArguments but this doesn’t seem to reduce any output.

I apologize if this is documented somewhere in the documentation, but I have not been able to find it.

My code:

model_training_arguments = TrainingArguments(
    "./model_output",
    evaluation_strategy = "epoch",
    fp16=True,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=4,
    seed=2,
    load_best_model_at_end=True,
    overwrite_output_dir=True
)

model_trainer = Trainer(
    model_init=model_init,
    args=model_training_arguments,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    tokenizer=tokenizer,
    compute_metrics=metrics,
)

model_trainer.hyperparameter_search(
    direction = "minimize",
    backend = "optuna",
    n_trials=5
    
)

Thanks in advance!

theudster · June 17, 2021, 9:56am

Set

load_best_model_at_end=False,

and add

save_strategy = 'no', # The checkpoint save strategy to adopt during training.

On the training arguments

Topic		Replies	Views
How to stop Optuna saving checkpoints during Hyperparameter Search Beginners	1	1306	June 14, 2021
Unusal pattern of CUDA out of error when using hyperparameter search (optuna backend) 🤗Transformers	0	273	September 12, 2023
Increasing validation loss even with small learning rate - RoBERTa Models	0	1125	March 1, 2021
Looking for hyperparameter tuning advices Beginners	0	932	November 3, 2022
Hyper params search for model config 🤗Transformers	0	168	February 22, 2024

Reducing output size when performing hyperparameter search

Related topics