Issue with Custom Nested Metrics

dspoka · October 27, 2021, 10:33am

transformers: ‘4.10.3’
datasets: 1.12.1

Im trying to follow the examples from here to make my own custom metric: datasets/super_glue.py at master · huggingface/datasets · GitHub

def _info(self):
return datasets.MetricInfo(
description=’_DESCRIPTION’,
citation=’_CITATION’,
inputs_description=’_KWARGS_DESCRIPTION’,
features=datasets.Features({
“predictions”: {
“1”: datasets.Value(“int64”),
“2”: datasets.Value(“int64”),
“3”: datasets.Value(“int64”),
},
‘references’: datasets.Value(“int64”),
}),
codebase_urls=[],
reference_urls=[],
# format=‘numpy’
)

and somewhere in code I call this:
metric.add_batch(predictions=outputs[‘predictions’], references=outputs[‘references’])

If my predictions is not nested but just equal to datasets.Value(“int64”) then it works.

What am I doing wrong here?

Stack trace:
metric.add_batch(predictions=outputs[‘predictions’], references=outputs[‘references’])
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/metric.py”, line 431, in add_batch
batch = self.info.features.encode_batch(batch)
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/features.py”, line 1080, in encode_batch
encoded_batch[key] = [encode_nested_example(self[key], obj) for obj in column]
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/features.py”, line 1080, in
encoded_batch[key] = [encode_nested_example(self[key], obj) for obj in column]
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/features.py”, line 886, in encode_nested_example
k: encode_nested_example(sub_schema, sub_obj) for k, (sub_schema, sub_obj) in utils.zip_dict(schema, obj)
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/features.py”, line 885, in
return {
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/utils/py_utils.py”, line 99, in zip_dict
yield key, tuple(d[key] for d in dicts)
File “/home/dspokoyn/.cache/pypoetry/virtualenvs/unit-7vxG7edj-py3.7/lib/python3.7/site-packages/datasets/utils/py_utils.py”, line 99, in
yield key, tuple(d[key] for d in dicts)
TypeError: string indices must be integers

lhoestq · November 5, 2021, 5:02pm

Hi ! Can you provide an example of predictions and references that you pass to the metric ?

You defined these feature types:

features=datasets.Features({
  “predictions”: {
    “1”: datasets.Value(“int64”),
    “2”: datasets.Value(“int64”),
    “3”: datasets.Value(“int64”),
  },
  ‘references’: datasets.Value(“int64”),
})

So your predictions must look like this:

predictions = [{"1": i, "2": j, "3": k}]

and your references like this:

references = [r]

where i,j,k,r are supposed to be integers.

Can you check that this is indeed the feature types that you want to use, and that your data match this structure ?

Topic		Replies	Views
How to write my own metrics if it is not in datasets.metrics 🤗Datasets	3	2719	October 12, 2022
AttributeError: 'TrainOutput' object has no attribute 'metrics' when finetune custom dataset 🤗Transformers	3	2513	January 4, 2021
Problem with custom metric for custom T5 model Beginners	1	762	October 9, 2023
Sample evaluation script on custom dataset Beginners	10	1616	December 14, 2021
Datasets - metrics Beginners	0	399	January 30, 2021

Issue with Custom Nested Metrics

Related topics