When to use Hub and when to use Datasets

Hi, is there a policy for when a new dataset should be placed on Hub and when it’s best to send it into Datasets?

most new datasets should be placed on the Hub.

Exceptions are if for instance there’s no “natural” organization or user to attach it too, AND it’s an “important” dataset

cc @lhoestq @osanseviero

3 Likes

According to our docs: Share

In some cases it makes more sense to open a PR on GitHub:

  • when you need the dataset to be reviewed
  • when you need long-term maintenance from the Hugging Face team
  • when there’s no clear org name / namespace that you can put the dataset under
2 Likes

Ah, in this case I’m even more honoured to have had a corpus placed on datasets github by the hf team! Thank you!