Hi,
I am training microsoft phi 2 using my own data set which is comprised of 2k data.
I have used the training parameters as shown below
training_arguments = TrainingArguments(
output_dir= output_dir,
num_train_epochs= 5,
max_steps= -1,
per_device_train_batch_size= 2,
gradient_accumulation_steps= 1,
optim="paged_adamw_32bit",
save_strategy="steps",
save_steps = 1500,
eval_steps = 100, ## it gets error
evaluation_strategy="steps",
logging_steps=100,
logging_strategy="steps",
learning_rate= 1e-5,
report_to="tensorboard",
fp16=False,
bf16=True,
)
Now I changed the epoch and learning rate and thought if i decrease the learning rate from 2e-4 to 1e-5 the training loss will go lower but it was not the case. Will show you the tensorboard graph. The blue one is where the learning rate is 2e-4 and pink is 1e-5.
So isn’t decreasing the learning rate decrease the training loss