Would a large, open source dataset where users can upload scraped and validated data (with more fine grained permissions wrt who can read and write data) comply with HuggingFace policies and terms of service? Data scraped by the users would be aggregated onto the large main dataset individually. Data would include text scraped from Reddit, Twitter/X, etc. and would be used to train models.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Can we collect crowd source dataset via Huggingface Dataset? | 1 | 239 | January 18, 2024 | |
Sharing data with huggingface or maker | 2 | 177 | April 19, 2024 | |
Can we upload datasets with a total size like the pile? | 2 | 636 | December 10, 2021 | |
Which URLs should be reachable to work with Huggingface hub | 2 | 1591 | January 26, 2022 | |
Download rows directly with API | 1 | 12 | September 12, 2024 |