I am ussing trainer, and got learninig_rate = 0 for 1.5 epocs

```
{'loss': 7.1495, 'learning_rate': 0.0, 'epoch': 0.06}
{'loss': 7.1167, 'learning_rate': 0.0, 'epoch': 0.32}
{'loss': 7.3134, 'learning_rate': 0.0, 'epoch': 0.63}
{'loss': 7.4039, 'learning_rate': 0.0, 'epoch': 0.95}
{'loss': 7.394, 'learning_rate': 0.0, 'epoch': 1.27}
{'loss': 5.4542, 'learning_rate': 0.00019523809523809525, 'epoch': 1.59}
{'loss': 0.5178, 'learning_rate': 0.00017142857142857143, 'epoch': 1.9}
```

my trainer arguments is

```
gradient_accumulation_steps: 4
num_train_epochs: 3
logging_steps: 5
output_dir: "output_dir"
save_strategy: 'no'
per_device_train_batch_size: 2
per_device_eval_batch_size: 1
logging_dir: './logs'
report_to: 'wandb'
logging_first_step: True
optim: "paged_adamw_8bit"
learning_rate: 2e-4
weight_decay: 0.01
fp16: True
bf16: False
max_grad_norm: 0.3
max_steps: -1
#warmup_ratio: 0.003
lr_scheduler_type: 'linear'
evaluation_strategy: 'no'
eval_accumulation_steps: 2
warmup_steps: 3
```

I try to change lr_scheduler_type (linear, cosine)

warmup_ratio and warmup_ratio, optim, etcâ€¦ but the phenomena is still happen

Any idea ?

Im using also accelerate with deep-speed integration and multiple GPU for training