KeyError: 'eval_accuracy' when running trainer

GBaker · March 5, 2023, 12:16am

Trying to train a model on the ‘medmcqa’ dataset, but the trainer throws an error after the first epoch.

Here is my trainer code:

training_args = TrainingArguments(
    output_dir="biolinkbert-base-mcmqa",
    label_names=["cop"],
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    learning_rate=3e-6,
    fp16=True,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    gradient_accumulation_steps=32,
    num_train_epochs=30,
    seed=42,
    weight_decay=1e-3,
    push_to_hub=True,
    logging_strategy="epoch",
    eval_delay=0,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    callbacks=[early_stopper],
)

trainer.train()

compute_metrics:

import evaluate

accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

Any help is greatly appreciated!

prit-sk · March 5, 2023, 10:21am

use this:

import evaluate

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

Try the above code.

GBaker · March 5, 2023, 2:53pm

That didn’t appear to work.

After placing a print statement in the compute_metrics function it seems that it isn’t ever being called.

prit-sk · March 5, 2023, 4:41pm

i also got the same error, but i am not able to recall how i solved it. but here are few observations from your code, try all of this:

from datasets import load_metric

metric = load_metric("accuracy") inplace of accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

any one is needed from the below.

tokenizer=tokenizer,
data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer)

i think there is a package version issue of either transformers or evaluate. Not sure though.
anyway try the above. Also why there is only single label name in your code? I didn’t get that

GBaker · March 5, 2023, 5:00pm

I’ll try those things, thanks. Just strange because I’ve used the code pretty much as it is with another multiple choice dataset recently and it worked fine.

The label name “cop” corresponds to the dataset column containing an integer from 0-3 representing the correct multiple choice answer.

NikhilKhodake · March 6, 2023, 7:38am

I am also getting this error- “TypeError: only size-1 arrays can be converted to Python scalars”. Basically I am fine-tunning the facebook/bart-base model, with samsum Dataset, in Amazon sagemaker.

arastu-mudgal · May 22, 2023, 8:05am

any fix for this yet?

divyaswamy87 · May 25, 2023, 2:38pm

I get the same error - any fix to this issue

Guldeniz · May 26, 2023, 10:59am

Hello.

I don’t know if it is the problem but in your compute_metrics, this line;

predictions = np.argmax(predictions, axis=1)

Shouldn’t it be axis=-1?

David-Saeteros6 · October 5, 2023, 9:11am

This worked for me:

# Load metric
metric_name = "f1"
metric = load_metric(metric_name)

# Define metrics
def compute_metrics(eval_pred):

  predictions, labels = eval_pred
  predictions = np.argmax(predictions, axis=1)

  # 'micro', 'macro', etc. are for multi-label classification. If you are running a binary classification, leave it as default or specify "binary" for average
  return metric.compute(predictions=predictions, references=labels, average="binary")

dreidizzle · October 8, 2023, 5:09pm

For the people where this was working: What version are you using? I’m on transformers 4.32.0 and this does not work.

Topic		Replies	Views
KeyError when training with a dictionary as a dataset. What should the dataset look like? Beginners	0	706	October 19, 2022
Trainer predict or evaluate returns zero for metrics 🤗Transformers	0	55	July 11, 2024
Combine multiple metrics in compute_metrics for validation Beginners	1	900	June 4, 2024
Am I using evaluate correctly? eval_accuracy is super low Beginners	0	369	August 30, 2023
`KeyError: 'eval_loss'` when using Trainer with BertForQA 🤗Transformers	7	7340	September 14, 2022

KeyError: 'eval_accuracy' when running trainer

Related topics