Accuracy metric throws during evaluation on sequence classification task

I’m fine-tuning a BertFor SequenceClassification model and I want to compute the accuracy on the evaluation set after each training epoch. However, the evaluation step fails with:

TypeError: 'list' object is not callable" when calling evaluate

Here’s a minimal example showing the error (see also this Colab notebook):

Standard tokenizer, toy model:

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
config = BertConfig(num_hidden_layers=0)
model = BertForSequenceClassification(config)

Grab the IMDb dataset, make training and evaluation sets:

imdb = load_dataset("imdb")

def mapper(x):
  return tokenizer(x["text"], max_length=384, truncation=True, padding="max_length")

eval_dataset = imdb["train"].select(range(100)).map(mapper, remove_columns=["text"])

Set up the trainer, including compute_metrics=[accuracy]:

accuracy = load_metric("accuracy")

args = TrainingArguments(
    "imdb",
    num_train_epochs=2,
    report_to="none",
    evaluation_strategy="epoch"
)

trainer = Trainer(
    model,
    args,
    eval_dataset=eval_dataset,
    compute_metrics=[accuracy],
    tokenizer=tokenizer,
)

Evaluate:

trainer.evaluate()

Raises:

TypeError: 'list' object is not callable

Isn’t this something that should “just work”?

compute_metrics should be a function that takes a namedtuple (of type EvalPredictions) and returns a dictionary metric nane/metric value.
Look at the text classificatione example or the course section on the Trainer.

1 Like