Evaluation became slower and slower during Trainer.train()

When I used Trainer.train() to fine-tune BartBase, I found something weird that the speed shown in progress bar became slower and slower (from 6 item/s to 0.29 item/s. Please help me, I’m new to transformers.

Here are my codes.

training_args = TrainingArguments(
    output_dir="Model/BartBase",
    overwrite_output_dir=True,
    
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    learning_rate=1e-5,
    num_train_epochs=20,
    lr_scheduler_type='linear',
    label_smoothing_factor=0,
    
#     logging_dir='runs',
    logging_strategy='steps', # log according to log_steps
    logging_steps=1,
    
    save_strategy='steps', # log according to save_steps
    save_steps=4000,
    save_total_limit=10, # limit the total amount of checkpoints
    
    evaluation_strategy="steps", # log according to eval_steps
    eval_steps=1, # I set eval_steps=1 to debug
    eval_accumulation_steps=1,
    
    seed=42, 
    
    load_best_model_at_end=True, # load best model according to metric_for_best_model
    metric_for_best_model='f1' # the string should be 
    )


from datasets import load_metric
import numpy as np

def compute_metrics(eval_pred):
    f1_metric = load_metric('f1')
    accuracy_metric = load_metric('accuracy')
    pred, label = eval_pred
    pred = np.argmax(pred, axis=-1)
    f1_score = f1_metric.compute(predictions=pred, references=label, average='micro')
    accuracy = accuracy_metric.compute(predictions=pred, references=label)
    return f1_socre.update(accuracy)


from transformers import Trainer
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    data_collator=collator, # if tokenizer is provided, no need to provide it explicitly
    
    train_dataset=train_dataset, # torch.utils.data.dataset.Dataset
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics
)

trainer.train()

After debugging step by step, I found that

  1. If I remove the compute_metrics=compute_metrics in Trainer, the evaluation went well.
  2. Even if I use a quite simple compute_metrics, the evaluation became slow and stopped eventually (without finishing progress) .
    def compute_metrics(eval_pred): 
          return {'f1': 1}
    

Please give me some helps. Thanks a lot!!! :pray: