Adding compute_metrics produces Cuda OutOfMemoryError

Hi all, currently training bert-base-uncased, max_length 256, batch_size 16 and Winogrande dataset on Google Colab. I’m trying to log training and validation accuracy and using a compute_metrics function. However, when I set the compute_metrics argument to the function and run Trainer, the Cuda OutOfMemoryError pops up. I’m able to train smoothly when I comment out compute_metrics. I’ve tried changing the hyperparameters(e.g. batch_size 16 → 8), but still no luck. Below is my code. If there is more information that I should give, please tell me:

metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    preds = np.argmax(predictions, axis=1)
    accuracy = metric.compute(predictions=preds, references=labels)
    return accuracy

training_args = TrainingArguments(
    output_dir="fine_tuned_bert_model",
    evaluation_strategy="epoch",
    learning_rate=LEARNING_RATE,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    weight_decay=WEIGHT_DECAY,
    save_total_limit=3,
    num_train_epochs=NUM_EPOCHS,
    fp16=True,
    logging_steps=1,
)

trainer = Trainer(
    model=quantized_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    # compute_metrics=compute_metrics  <----------- This is the problem---------
)

trainer.train()

Thanks!