Evaluate with pipeline on custom dataset

Hey everyone

I have several custom datasets, I used one of them to craft a NER model and now I would like to see performance on the other datasets, ideally using the Evaluate model (I assume this would be the cleanest method). However, I am not been able to do that, since every time a new error comes out :sweat_smile:

Here’s my code (the key parts, I assume the rest is correct), and later an example of the dataset, can you tell me what I’m doing wrong?

# Imports
...

# Pipeline definition
pipe = pipeline(
    'token-classification', 
    model = 'path/to/my/model', 
    device = 0
)

# Load data
...
my_dataset
# Dataset({
#    features: ['tokens', 'ner_tags_numeric'],
#    num_rows: 400
# })

# Evaluate model
results = task_evaluator.compute(
    model_or_pipeline = pipe,
    data = my_dataset,
    metric = # here I would like to evaluate precision and recall
    input_column = 'tokens',
    label_column = 'ner_tags_numeric',
)

This is the error I get:
NotImplementedError: References provided as integers, but the reference column is not a Sequence of ClassLabels.

For every row of my_dataset, ‘tokens’ is a list of words, while ‘ner_tag_numeric’ is a list of integers. I have to cast it manually from strings, because the model was complaining the the type should be int32.

This is a brief extract from the dataset. It is annotated by using this website, I think it’s in some spaCy format:

{
    "classes": [
        "TRATTAMENTO FARMACOLOGICO",
        "TEST",
        "DIAGNOSI E COMORBIDITÀ",
        "SINTOMI COGNITIVI",
        "SINTOMI NEUROPSICHIATRICI"
    ],
    "annotations": [
        [
            "ANAMNESI FARMACOLOGICA ...",
            {
                "entities": [
                    [
                        39,
                        45,
                        "TRATTAMENTO FARMACOLOGICO"
                    ]
                ]
            }
        ],
        [
            "ANAMNESI PATOLOGICA REMOTA E PATOLOGIE ATTIVE ...",
            {
                "entities": [
                    [
                        106,
                        125,
                        "DIAGNOSI E COMORBIDITÀ"
                    ],
                    ...
                ]
            }
        ]
}