Hi all,
I’m trying to evaluate NER datasets. I can evaluate the performance of CoNLL-2003 English and CoNLL-2002 Spanish. But I found the index error when I evaluated the performance on CoNLL-2002 Dutch.
Here are my codes:
from evaluate import evaluator
from datasets import load_dataset
task_evaluator = evaluator("token-classification")
data = load_dataset("conll2002",'nl',split="test")
results = task_evaluator.compute(
model_or_pipeline="xlm-roberta-large-finetuned-conll03-english",
data=data,
metric="seqeval",
)
print(results)
And here my errors:
Cell In [5], line 1
----> 1 results = task_evaluator.compute(
2 model_or_pipeline="xlm-roberta-large-finetuned-conll03-english",
3 data=data,
4 metric="seqeval",
5 )
6 print(results)
File /opt/conda/envs/spacy_env/lib/python3.9/site-packages/evaluate/evaluator/token_classification.py:253, in TokenClassificationEvaluator.compute(self, model_or_pipeline, data, subset, split, metric, tokenizer, strategy, confidence_level, n_resamples, device, random_state, input_column, label_column, join_by)
251 # Compute predictions
252 predictions, perf_results = self.call_pipeline(pipe, pipe_inputs)
--> 253 predictions = self.predictions_processor(predictions, data[input_column], join_by)
254 metric_inputs.update(predictions)
256 # Compute metrics from references and predictions
File /opt/conda/envs/spacy_env/lib/python3.9/site-packages/evaluate/evaluator/token_classification.py:125, in TokenClassificationEvaluator.predictions_processor(self, predictions, words, join_by)
122 token_index = 0
123 for word_offset in words_offsets:
124 # for each word, we may keep only the predicted label for the first token, discard the others
--> 125 while prediction[token_index]["start"] < word_offset[0]:
126 token_index += 1
128 if prediction[token_index]["start"] > word_offset[0]: # bad indexing
IndexError: list index out of range
Python 3.9
Transformer 4.24.0.dev0
Evaluate 0.3.0
Torch 1.12.1
Thank you.