Using a custom metric on the Huggingface Hub

I am developing a dataset and accompanying custom metric at cpllab/syntaxgym at main.

I can load my custom dataset with e.g. datasets.load_dataset("cpllab/syntaxgym", "subordination_src-src").
But I cannot load my metric with the same Huggingface Hub reference:

> metric = datasets.load_metric("cpllab/syntaxgym", download_mode=datasets.DownloadMode.FORCE_REDOWNLOAD)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-9-189ce62668b8> in <module>()
----> 1 metric = datasets.load_metric("cpllab/syntaxgym", download_mode=datasets.DownloadMode.FORCE_REDOWNLOAD)

1 frames
/usr/local/lib/python3.7/dist-packages/datasets/load.py in metric_module_factory(path, revision, download_config, download_mode, force_local_path, dynamic_modules_path, **download_kwargs)
   1372                 ) from None
   1373     else:
-> 1374         raise FileNotFoundError(f"Couldn't find a metric script at {relative_to_absolute_path(combined_path)}.")
   1375 
   1376 

FileNotFoundError: Couldn't find a metric script at /content/cpllab/syntaxgym/syntaxgym.py.

A peek at the metric loading logic suggests that only metrics without / in their name trigger a lookup on the Huggingface hub. This doesn’t seem right to me – how can I make my custom metric available as a community member on the Huggingface hub?

Hi @jgauthier,

Downloading metrics from the Hub is not a feature of datasets. It can only load them either locally or from the Hub. However, in the newly released evaluate library you can have metrics on the Hugging Face Hub! Note that the metric should be in a Space and not a Dataset repo.

You can use the evaluate-cli to setup the space and fill a template for you:

evaluate-cli create "Syntax Gym"

Note that there is a bug in the current release that won’t let you push the template to the Hub. However if you install from main it should work:

pip install git+https://github.com/huggingface/evaluate.git

Also note that you need to change a few things in the code (as you will see in the template) as we moved from datasets.Metric to evaluate.EvaluationModule. Let me know how it goes! Here is an example of a custom metric: lvwerra/aweeesoooome_metric at main