Got KeyError "eval_wer" for fine-tuning and evaluating wav2vec2

Hello everyone,

I was following the blog to fine-tune Wav2Vec2 model on a custom dataset for domain adaption. But I encountered this KeyError: 'eval_wer' during evaluation.

The code is almost the same; I made some changes:

def compute_metrics(pred):
    """
    batchfy and compute the WER metrics

    :param pred: _description_
    :type pred: _type_
    :return: _description_
    :rtype: _type_
    """
    wer_metric = load(wer)
    pred_logits = pred.predictions
    pred_ids = np.argmax(pred_logits, axis=-1)
    pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
    pred_str = processor.batch_decode(pred_ids)
    # we do not want to group tokens when computing the metrics
    label_str = processor.batch_decode(pred.label_ids, group_tokens=False)
    wer = wer_metric.compute(predictions=pred_str, references=label_str)
   # tried return{"eval_wer": wer} did not work
    return {"wer": wer}

training_args = TrainingArguments(
      output_dir=f"../ft-models/asr/{args.model_card}",
      group_by_length=True,
      per_device_train_batch_size=batch,
      num_train_epochs=epochs,
      evaluation_strategy="epoch",
      logging_strategy="epoch",
      save_strategy="epoch",
      fp16=True,
      gradient_checkpointing=True,
      do_train=True,
      do_eval=True,
      learning_rate=1e-4,
      weight_decay=0.005,
      warmup_steps=10,
      save_total_limit=2,
      logging_dir='../logs',
      data_seed=42,
      metric_for_best_model="wer",
      greater_is_better=False,
      seed=42,
      report_to="none",
      load_best_model_at_end=True)
  trainer = Trainer(
      model=model,
      data_collator=data_collator,
      args=training_args,
      tokenizer=processor.feature_extractor,
      train_dataset=cus_dataset["train"],
      eval_dataset=cus_dataset["test"])
trainer.train()

Here is the full error message:

Traceback (most recent call last):
  File "fine_tune_asr.py", line 332, in <module>
    trainer.train()
  File "../lib/python3.8/site-packages/transformers/trainer.py", line 1498, in train
    return inner_training_loop(
  File ../lib/python3.8/site-packages/transformers/trainer.py", line 1832, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "../lib/python3.8/site-packages/transformers/trainer.py", line 2042, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "../lib/python3.8/site-packages/transformers/trainer.py", line 2160, in _save_checkpoint
    metric_value = metrics[metric_to_check]

It worked fine ~2 months ago. I probably did an upgrade sometime in between and I wonder if some logic got changed? If so, how to get this eval_wer during evaluation? Here is the environment setting by running transformers-cli env:

- `transformers` version: 4.21.0
- Platform: Linux-5.15.0-43-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0+cu116 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

Thanks!

Okay the problem solved! I realized evaluating on epochs is not feasible for my dataset, and I set the eval_steps too large for the evaluation.