Hyperparameter-Search while adding Special tokens

LearnToGrow · December 16, 2022, 4:42pm

Hi,
I am uisng ray-tune to do hyperprameter search for distilroberta model for classification. Here is an overview of my code:

# define get_model function 
    def get_model(params):
        db_config = config
        if params is not None:
            db_config.update({'attention_probs_dropout_prob': params['attention_drop_out'],
                            'hidden_dropout_prob': params['hidden_drop_out']
                            })
        model = AutoModelForSequenceClassification.from_pretrained(
            model_args.model_name_or_path,
            config=db_config,
            cache_dir=model_args.cache_dir,
            revision=model_args.model_revision,
            use_auth_token=True if model_args.use_auth_token else None,
            ignore_mismatched_sizes=model_args.ignore_mismatched_sizes,
        )
        
        if special_tokens is not None:
            model.resize_token_embeddings(len(tokenizer))
        
        # setup label_to_id
        model.config.label2id = label_to_id
        model.config.id2label = {
            id: label for label, id in config.label2id.items()}
        return model

def ray_hp_space(trial):
    
        return {
            "attention_drop_out": tune.uniform(0.1, 0.5),
            
            "hidden_drop_out": tune.uniform(0.1, 0.5),

            "learning_rate": tune.uniform(1e-5, 2e-5),

            "weight_decay": tune.uniform(0.005, 0.01),

            "gradient_accumulation_steps": tune.choice([1, 2, 4]),

            "label_smoothing_factor": tune.choice([.7,.8,.9,.91])

        }


        trainer = Trainer(
            model_init=get_model,
            args=training_args,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=validation_dataset if training_args.do_eval else None,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
            data_collator=data_collator,
            callbacks = [EarlyStoppingCallback(early_stopping_patience=7)]


scheduler = ASHAScheduler(
        metric="f1",
        mode="max",
        max_t=1,
        grace_period=1,
        reduction_factor=2)
    
    reporter = CLIReporter(
        parameter_columns={
            "weight_decay": "w_decay",
            "learning_rate": "lr",
            "gradient_accumulation_steps": "gradient_accum_steps",
            "label_smoothing_factor": "label_smooth",
            "hidden_drop_out": "hidden_drop_out",
            "attention_drop_out": "attention_drop_out"
        },
        metric_columns=[
            "eval_accuracy", "eval_f1", "eval_loss", "steps"
        ])

    best_trail = trainer.hyperparameter_search(direction="maximize",
                                                backend='ray',
                                                hp_space=ray_hp_space,
                                                n_trials=1,
                                                resources_per_trial={"cpu":2, "gpu":1},
                                                scheduler=scheduler,
                                                keep_checkpoints_num=1,
                                                checkpoint_score_attr="training_iteration",
                                                progress_reporter=reporter,
                                                local_dir="experiments/ray-tune-results/"
                                              )

The problem is at some point in the training, create a new instance from the original config (without additional special tokens) and try to copy weights from the current instance which has additional special tokens) and it throws a mismatch errors ? It looks I have to force the model creating the first instance from the original config and after that he has to start use the current config (with additional special tokens). How to fix that ?

Eran · January 28, 2024, 9:02am

Bumping this, did you find a solution?

Topic		Replies	Views
Can trainer.hyperparameter_search also tune the drop_out_rate? Beginners	3	1205	May 7, 2024
Best way for adding the model and the tokenizer as hyper-parameters in RayTune 🤗Transformers	0	265	December 27, 2021
Inconsistency in hyperparameter search results 🤗Transformers	2	639	April 13, 2022
Hyperparameter search 🤗Transformers	0	434	March 10, 2021
Trainer.Hyperparameter_search() Trials did not complete. How to optimize parameters with ray tune? Beginners	0	938	January 10, 2023

Hyperparameter-Search while adding Special tokens

Related topics