Add dataset revision to a created dataset

Hey there,

Some of the Dataset from_xxx methods have a DatasetInfo parameter when loading/creating Dataset while the others don’t. My use case is the automatic creation of Datasets from json and pushing those datasets to hub after building them. I want to add revision/version to these datasets but don’t know how - can’t add it via dataset push_to_hub nor via DatasetInfo.

Any ideas how to achieve this?

Best,
Vladimir

maybe tagging @lhoestq @albertvillanova?

2 Likes

Thanks @julien-c - this seems to be a growing use case - synthetic datasets that are automatically generated, cleaned up, tagged/versioned, pushed to the huggingface hub, and subsequently downloaded and used to fine-tune models etc.

I opened an issue to request a way to add a git tag using huggingface_hub: Add a git tag to a repository · Issue #1014 · huggingface/huggingface_hub · GitHub

1 Like