Finetuning longformer

I am building a longformer based classification model similar to this. If I want to tune my model, which parameters do I need to consider and are there any recommendations for them.

currently I am thinking about below parameters and their values as below

attention_window=256, 512, or 1024
optim=“adamw_torch, adamw_apex_fused, or adafactor”

what other parameters should I tune? Do I need to tune any of these?
num_train_epochs, per_device_train_batch_size, gradient_accumulation_steps, per_device_eval_batch_size, warmup_steps, dataloader_num_workers, lr_scheduler_type,

Please let me know if there is any documentation about parameter tuning


The Trainer supports hyperparameter search using Optuna or Ray Tune. Check the bottom of the official text classification notebook for more info.

1 Like

it seems that only parameters that are being tuned are below. Is there any reason to only tune those? should i look into tuning others?
hyperparameters={'learning_rate': 4.357724525964853e-05, 'num_train_epochs': 2, 'seed': 38, 'per_device_train_batch_size': 32})