Hi,
How can I evaluate an existing model trained on boolq dataset, WITHOUT retraining it?
I’m trying the “evaluate” package of HF, and the question-answering evaluator, but I got some errors.
Here’s my main code:
from transformers import AutoModelWithHeads
import torch
from datasets import load_dataset
import evaluate
from evaluate import evaluator
model = AutoModelWithHeads.from_pretrained("roberta-base")
adapter_name = model.load_adapter("AdapterHub/roberta-base-pf-boolq", source="hf")
model.active_adapters = adapter_name
from transformers import RobertaTokenizer, RobertaModel
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
eval = evaluator("question-answering")
results = eval.compute(model_or_pipeline=model, data="boolq", metric="accuracy",
question_column="question", context_column="passage",
id_column=None, label_column="answer")
It gave me this error:
/opt/anaconda3/envs/hugging-face/lib/python3.7/site-packages/evaluate/evaluator/question_answering.py in compute(self, model_or_pipeline, data, subset, split, metric, tokenizer, strategy, confidence_level, n_resamples, device, random_state, question_column, context_column, id_column, label_column, squad_v2_format)
189 context_column=context_column,
190 id_column=id_column,
--> 191 label_column=label_column,
192 )
193
/opt/anaconda3/envs/hugging-face/lib/python3.7/site-packages/evaluate/evaluator/question_answering.py in prepare_data(self, data, question_column, context_column, id_column, label_column)
104 "context_column": context_column,
105 "id_column": id_column,
--> 106 "label_column": label_column,
107 },
108 )
/opt/anaconda3/envs/hugging-face/lib/python3.7/site-packages/evaluate/evaluator/base.py in check_required_columns(self, data, columns_names)
301 if column_name not in data.column_names:
302 raise ValueError(
--> 303 f"Invalid `{input_name}` {column_name} specified. The dataset contains the following columns: {data.column_names}."
304 )
305
ValueError: Invalid `id_column` None specified. The dataset contains the following columns: ['question', 'answer', 'passage'].
I’m not sure I’m in the right direction in evaluating a boolq model.
Please advise. Thanks a lot!