How to use Adaptive Learning rate during training?

Hi,
I am trying to train a Siamese network and want to incorporate adaptive learning rate during training.
So I have a SiameseTrainer which is a subclass of Trainer class. My question is am I doing something wrong that I get the error

TypeError: Object of type Tensor is not JSON serializable

in this line of my code:

train_result = trainer.train(resume_from_checkpoint=checkpoint)

My code runs for few epochs and then crashes. So what I am doing is below:

def compute_metrics(p: EvalPrediction):
  correct_predictions = p.predictions[1] if isinstance(p.predictions, tuple) else p.predictions
  labels = p.label_ids
  return {
    "MSE": np.mean((correct_predictions-labels)**2)
  }

optimizer = Adafactor(
            model.parameters(), 
            scale_parameter=True, # If True, learning rate is scaled by root mean square
        )
lr_scheduler = AdafactorSchedule(optimizer)

trainer = SiameseTrainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            data_collator=data_collator,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
            callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
            optimizers=(optimizer, lr_scheduler)
)

When I comment out the optimizers=(optimizer, lr_scheduler) parameter and run the code, it does not crash in the middle of epochs.

Appreciate any help!

1 Like

Hi, anzaman

I am stuck on the same problem as you do. Have You fixed this one yet?

This is mine error.

##########UPDATE
I found a way to trick it by adding logging_steps=9999999 in the trainer argument to make it not log between training , but is there any better solutions for this one.

your trick worked for me too. do you know if there are any updates that allow using adaFactor in training and successfully log without tripping TypeError: Object of type Tensor is not JSON serializable

I have the same problem.

I ran into an issue like this when I inadvertently logged a metric in the trainer that was a Tensor instead of just a float. I did something like average_scores.mean(), which returned a Tensor object, which was somehow affixed to Trainer.state and then broke json-saving. I changed the metric computation to average_scores.mean().item() and this fixed the issue.