Say that I have the following Seq2SeqTrainingArguments class:
Seq2SeqTrainingArguments(
adafactor = True,
optim = "adafactor",
learning_rate = 1e-4
)
In this case, I am not sure if the learning_rate is actually used anywhere. From the Seq2SeqTrainingArguments documentation:
- learning_rate (
float
, optional, defaults to 5e-5) — The initial learning rate for AdamW optimizer.
Does this mean that it is completely ignored for Adafactor?
Thank you!