Webhook to track dataset downloads

Can I create a webhook to track when a dataset (any of my datasets) is downloaded?
We are planning to share a series of datasets publicly on the hub, and would like to track usage with a webhook triggered by any action (downloads, comments, etc) that happen on any of our datasets, to post then a message on Discord. Is that possible?

Thanks in advance

Hi @DataBoutique! Sorry for the long delay. It is possible to setup webhooks to be notified on changes on the repo (i.e. any new commit) and community activity (i.e. comment, PR, discussion,…). You can have a look at this guide to learn how to configure it: Webhooks. It is possible to configure a webhook to watch all of your datasets at once if you are interested in that.

Regarding watching download counts, this is not possible. This would not be scalable for us given how often some models/datasets are downloaded (1M+ per day for some popular models). However, it is possible to get from an API the number of downloads of a dataset. For example bigcode/the-stack has been downloaded 3396 times over the last month and 87245 times overall. This information is available at this url: https://huggingface.co/api/datasets/bigcode/the-stack?expand[]=downloads&expand[]=downloadsAllTime.

1 Like

The API solution looks like the right fit, thanks a lot!

1 Like