My end goal is to get a faster learning rate decay.
The scheduler I use does not matter to me I just want it to decay about 2 orders of magnitude from 5e-5 to 5e-7 or something in that range.
This was a test with an inverse sqrt scheduler to see if the kwargs are passed correctly but it seems to default to a timescale of 10000 instead.
How can I make the scheduler accept the kwargs.
These are my Training Args:
training_args = TrainingArguments(
output_dir=model_save_path,
overwrite_output_dir=True,
num_train_epochs=20,
per_device_train_batch_size=64,
save_steps=0.06,
save_total_limit=15,
prediction_loss_only=True,
logging_steps=0.01,
lr_scheduler_type=SchedulerType.INVERSE_SQRT,
lr_scheduler_kwargs={"timescale": 3000},
disable_tqdm=True,
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset["train"],
)
This is an output from my log history
{
"epoch": 19.024183283835384,
"grad_norm": 1.1331897974014282,
"learning_rate": 2.135115047368557e-05,
"loss": 7.7059,
"step": 44840
}
When I calculate the learning rate with my timescale at that step I should get:
5e-5 *1 /sqrt((44840+3000)/3000) = 0.00001252088
but it seems like it is still using the default timescale of 10000 since this fits exactly:
5e-5 *1 /sqrt((44840+10000)/10000 = 0.00002135115