Huggingface transformers longformer optimizer warning AdamW

nitempe · February 14, 2022, 3:21pm

I get below warning when I try to run the code from this page.

/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,

I am super confused because the code doesn’t seem to set the optimizer at all. The most probable places where the optimizer was set could be below but I dont know how to change the optimizer then

# define the training arguments
training_args = TrainingArguments(
    output_dir = '/media/data_files/github/website_tutorials/results',
    num_train_epochs = 5,
    per_device_train_batch_size = 8,
    gradient_accumulation_steps = 8,    
    per_device_eval_batch_size= 16,
    evaluation_strategy = "epoch",
    disable_tqdm = False, 
    load_best_model_at_end=True,
    warmup_steps=200,
    weight_decay=0.01,
    logging_steps = 4,
    fp16 = True,
    logging_dir='/media/data_files/github/website_tutorials/logs',
    dataloader_num_workers = 0,
    run_name = 'longformer-classification-updated-rtx3090_paper_replication_2_warm'
)

# instantiate the trainer class and check for available devices
trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_data,
    eval_dataset=test_data
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

I tried another transformer such as distilbert-base-uncased using the identical code but it seems to run without any warnings.

Is this warning more specific to longformer?
How should I change the optimizer?

sgugger · February 14, 2022, 4:13pm

It’s a deprecation warning, so you will only get it once (that’s why you don’t see it for DistilBERT). To switch optimizer, put optim="adamw_torch" in your TrainingArguments (the default is "adamw_hf")

Gerwin · April 25, 2022, 8:30am

Under the hood, if you do not specify an optimizer/scheduler in the Trainer class, it will create an instance of AdamW with a linear scheduler. To overcome this problem, you can add an optimizer yourself by adding the argument ‘optimizers=(your_optimizer, your_scheduler)’ to the Trainer.

Topic		Replies	Views
FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead Beginners	2	5081	July 9, 2023
Error when following Transformers Language modeling tutorial step by step Beginners	1	2086	July 28, 2022
ValueError: Expected input batch_size (4096) to match target batch_size (8) Beginners	3	8412	April 2, 2023
Strange error when using the Longformer (HuggingFace developers, please reply) 🤗Transformers	8	1798	October 12, 2020
AdamW Pytorch vs Huggingface 🤗Transformers	0	1384	January 27, 2023

Huggingface transformers longformer optimizer warning AdamW

Related topics