Trainer lr scheduler does not get kwargs

My end goal is to get a faster learning rate decay.
The scheduler I use does not matter to me I just want it to decay about 2 orders of magnitude from 5e-5 to 5e-7 or something in that range.

This was a test with an inverse sqrt scheduler to see if the kwargs are passed correctly but it seems to default to a timescale of 10000 instead.

How can I make the scheduler accept the kwargs.

These are my Training Args:

training_args = TrainingArguments(
    output_dir=model_save_path,
    overwrite_output_dir=True,
    num_train_epochs=20,
    per_device_train_batch_size=64,
    save_steps=0.06,
    save_total_limit=15,
    prediction_loss_only=True,
    logging_steps=0.01,
    lr_scheduler_type=SchedulerType.INVERSE_SQRT,
    lr_scheduler_kwargs={"timescale": 3000},
    disable_tqdm=True,
)


trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset["train"], 
)

This is an output from my log history

{
      "epoch": 19.024183283835384,
      "grad_norm": 1.1331897974014282,
      "learning_rate": 2.135115047368557e-05,
      "loss": 7.7059,
      "step": 44840
    }

When I calculate the learning rate with my timescale at that step I should get:
5e-5 *1 /sqrt((44840+3000)/3000) = 0.00001252088

but it seems like it is still using the default timescale of 10000 since this fits exactly:
5e-5 *1 /sqrt((44840+10000)/10000 = 0.00002135115