Hi, is there a policy for when a new dataset should be placed on Hub and when it’s best to send it into Datasets?
most new datasets should be placed on the Hub.
Exceptions are if for instance there’s no “natural” organization or user to attach it too, AND it’s an “important” dataset
According to our docs: Share
In some cases it makes more sense to open a PR on GitHub:
- when you need the dataset to be reviewed
- when you need long-term maintenance from the Hugging Face team
- when there’s no clear org name / namespace that you can put the dataset under
Ah, in this case I’m even more honoured to have had a corpus placed on datasets github by the hf team! Thank you!