Accuracy metric throws during evaluation on sequence classification task

mgreenbe · November 20, 2021, 6:19pm

I’m fine-tuning a BertFor SequenceClassification model and I want to compute the accuracy on the evaluation set after each training epoch. However, the evaluation step fails with:

TypeError: 'list' object is not callable" when calling evaluate

Here’s a minimal example showing the error (see also this Colab notebook):

Standard tokenizer, toy model:

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
config = BertConfig(num_hidden_layers=0)
model = BertForSequenceClassification(config)

Grab the IMDb dataset, make training and evaluation sets:

imdb = load_dataset("imdb")

def mapper(x):
  return tokenizer(x["text"], max_length=384, truncation=True, padding="max_length")

eval_dataset = imdb["train"].select(range(100)).map(mapper, remove_columns=["text"])

Set up the trainer, including compute_metrics=[accuracy]:

accuracy = load_metric("accuracy")

args = TrainingArguments(
    "imdb",
    num_train_epochs=2,
    report_to="none",
    evaluation_strategy="epoch"
)

trainer = Trainer(
    model,
    args,
    eval_dataset=eval_dataset,
    compute_metrics=[accuracy],
    tokenizer=tokenizer,
)

Evaluate:

trainer.evaluate()

Raises:

TypeError: 'list' object is not callable

Isn’t this something that should “just work”?

sgugger · November 21, 2021, 2:12pm

compute_metrics should be a function that takes a namedtuple (of type EvalPredictions) and returns a dictionary metric nane/metric value.
Look at the text classificatione example or the course section on the Trainer.

Topic		Replies	Views
Logging training accuracy using Trainer class 🤗Transformers	8	10455	December 2, 2021
Log training accuracy using Trainer class Beginners	1	677	December 19, 2021
Couple of questions about Trainer Beginners	0	329	June 13, 2023
Rouge-L score in Trainer huggingface 🤗Transformers	1	2000	September 25, 2023
Getting the same value for all evaluation metrics Models	1	106	July 21, 2024

Accuracy metric throws during evaluation on sequence classification task

Related topics