Finetuning longformer

nitempe · March 15, 2022, 4:47pm

I am building a longformer based classification model similar to this. If I want to tune my model, which parameters do I need to consider and are there any recommendations for them.

currently I am thinking about below parameters and their values as below

attention_window=256, 512, or 1024
optim=“adamw_torch, adamw_apex_fused, or adafactor”
weight_decay=0,0.01,0.02
learning_rate=5e-5,10e-5

what other parameters should I tune? Do I need to tune any of these?
num_train_epochs, per_device_train_batch_size, gradient_accumulation_steps, per_device_eval_batch_size, warmup_steps, dataloader_num_workers, lr_scheduler_type,

Please let me know if there is any documentation about parameter tuning

nielsr · March 16, 2022, 8:26am

Hi,

The Trainer supports hyperparameter search using Optuna or Ray Tune. Check the bottom of the official text classification notebook for more info.

nitempe · March 18, 2022, 12:40am

it seems that only parameters that are being tuned are below. Is there any reason to only tune those? should i look into tuning others?
hyperparameters={'learning_rate': 4.357724525964853e-05, 'num_train_epochs': 2, 'seed': 38, 'per_device_train_batch_size': 32})

Topic		Replies	Views
Huggingface sequence classification unfreezing layers 🤗Transformers	2	1327	March 24, 2022
Fine-tuned longformer classifies all test samples as False Beginners	0	354	May 19, 2022
How to change parameters of pre-trained longformer model from huggingface Beginners	0	984	August 2, 2021
How to use optuna or raytune to search for parameters not in TrainingArguments? 🤗Transformers	0	198	August 30, 2021
Best way for adding the model and the tokenizer as hyper-parameters in RayTune 🤗Transformers	0	267	December 27, 2021

Finetuning longformer

Related topics