Dataset revision number

Hello,
I couldn’t find a way to retrieve the revision number of a given dataset without having to download any bit of it.
With the download, I know of dataset["train"].info.version.
Without the download, I couldn’t figure it out so far.
I tried using the HfApi via the api.dataset_info(dataset_name) but it doesn’t do it.
Nor does api.list_datasets(dataset_name).
Any pointer on how to get there folks please ?

I know List all available revision? is about the same and has no answer from dec 2023.

Answered here: List all available revision? - #4 by nielsr

Thank you very much @nielsr for your kind answering.
The link you’re pointing to gives repo branch names (which seems to fit the requirement of @borgr in the post you linked).
I’m myself looking for revision number of the datasets (like v1.12.43, as returned by the “offline/once_downloaded” dataset["train"].info.version python command).
Do you know of a way to achieve this ?

Will ping @Wauplin here

1 Like

I’m myself looking for revision number of the datasets (like v1.12.43, as returned by the “offline/once_downloaded” dataset["train"].info.version python command).

This information is not Hub-specific but dataset-specific AFAIK.
@lhoestq @albertvillanova would you have a solution for that?

You can use datasets.get_dataset_config_info:

  • To get the DatasetInfo of the default (or single) configuration:
In [1]: from datasets import get_dataset_config_info
In [2]: dataset_info = get_dataset_config_info("kilt_wikipedia")
In [3]: dataset_info.version
Out[3]: 1.0.0
  • To get the DatasetInfo for a particular config:
In [1]: from datasets import get_dataset_config_info
In [2]: dataset_info = get_dataset_config_info("wikimedia/wikipedia", "20231101.ca")
In [3]: dataset_info.version
Out[3]: 0.0.0

Please note that most no-script data-only datasets do not explicitly set a specific “version” number and will return 0.0.0 (the default one).

1 Like

Merci beaucoup Albert.
Many valuable info here.
That’s what I was hoping existed.
I will put it all to good use.

Thank you all very much for your persistence in helping a dude out.
Much grateful.
Rock on !

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.