DOI Metadata and Findability

I asked previously about DOI data backups, and I now have a related question regarding the metadata and findability of the archived (DOI’d) revisions of datasets and models.

This is a multi-part issue for which I’ll point to one of our (Imageomics Institute) datasets.

  1. The two different DOI revisions of this both point to the dataset at its current commit: imageomics/2018-NEON-beetles · Datasets at Hugging Face. Specifically, Revision a596e65 has DOI 10.57967/hf/6890, which links to the dataset in its current form, as does the first DOI: 10.57967/hf/5252 at Revision 7b3731d. When I paste the URL imageomics/2018-NEON-beetles · Datasets at Hugging Face into a browser it should point at the repository at that particular commit (imageomics/2018-NEON-beetles at 7b3731daca1f91931f8086d6782ab9150ab8ce26).
  2. Related to (1), this information (the commit associated to the revision) is not accessible to anyone who does not have access to the settings of the repository (as far as I can tell). This means that authors must manually include links to earlier revisions to provide appropriate access.
  3. There seems to be a general issue with metadata connections for the DOIs, as they should include the commit to which the DOI is applied, as well as author information. The addition of the ability to add the authors at DOI generation is helpful. However, it also doesn’t read the dataset pretty name, instead just using the repo name for the auto citation (this seems an easy fix).
  4. The question related to my original question about storage, is where is the metadata maintained and is there an ability to update authors associated to DOIs already created before the author specification option was available?

Ideally, since Hugging Face already has version control built in, the DOIs for a particular model or dataset could be handled in an analogous manner to Zenodo: there is a version-agnostic DOI which always links to the latest version of the record, while also providing version-specific DOIs and the ability for anyone to access any version. Ex: Collaborative Distributed Science Guide. Without a means of linking to the actual DOI’d content, it doesn’t really serve as a proper persistent identifier.

I know this is an ongoing project with continued improvements, so I suppose I’m asking if these types of FAIR features are currently in the works and if there’s a timeline for them?

Thanks!

1 Like