KeyError: 'eval_accuracy' when running trainer

Trying to train a model on the ‘medmcqa’ dataset, but the trainer throws an error after the first epoch.

Here is my trainer code:

training_args = TrainingArguments(
    output_dir="biolinkbert-base-mcmqa",
    label_names=["cop"],
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    learning_rate=3e-6,
    fp16=True,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    gradient_accumulation_steps=32,
    num_train_epochs=30,
    seed=42,
    weight_decay=1e-3,
    push_to_hub=True,
    logging_strategy="epoch",
    eval_delay=0,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    callbacks=[early_stopper],
)

trainer.train()

compute_metrics:

import evaluate

accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

Any help is greatly appreciated!

1 Like

use this:

import evaluate

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

Try the above code.

That didn’t appear to work.

After placing a print statement in the compute_metrics function it seems that it isn’t ever being called.

i also got the same error, but i am not able to recall how i solved it. but here are few observations from your code, try all of this:

from datasets import load_metric

metric = load_metric("accuracy") inplace of accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)
  1. any one is needed from the below.
tokenizer=tokenizer,
data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer)

i think there is a package version issue of either transformers or evaluate. Not sure though.
anyway try the above. Also why there is only single label name in your code? I didn’t get that

I’ll try those things, thanks. Just strange because I’ve used the code pretty much as it is with another multiple choice dataset recently and it worked fine.

The label name “cop” corresponds to the dataset column containing an integer from 0-3 representing the correct multiple choice answer.

I am also getting this error- “TypeError: only size-1 arrays can be converted to Python scalars”. Basically I am fine-tunning the facebook/bart-base model, with samsum Dataset, in Amazon sagemaker.

any fix for this yet?

I get the same error - any fix to this issue

Hello.

I don’t know if it is the problem but in your compute_metrics, this line;

predictions = np.argmax(predictions, axis=1)

Shouldn’t it be axis=-1?

This worked for me:

# Load metric
metric_name = "f1"
metric = load_metric(metric_name)

# Define metrics
def compute_metrics(eval_pred):

  predictions, labels = eval_pred
  predictions = np.argmax(predictions, axis=1)

  # 'micro', 'macro', etc. are for multi-label classification. If you are running a binary classification, leave it as default or specify "binary" for average
  return metric.compute(predictions=predictions, references=labels, average="binary")

For the people where this was working: What version are you using? I’m on transformers 4.32.0 and this does not work.