Large Open Dataset Allowed?

camyc · June 5, 2024, 5:16pm

Would a large, open source dataset where users can upload scraped and validated data (with more fine grained permissions wrt who can read and write data) comply with HuggingFace policies and terms of service? Data scraped by the users would be aggregated onto the large main dataset individually. Data would include text scraped from Reddit, Twitter/X, etc. and would be used to train models.

Topic		Replies	Views
Can we collect crowd source dataset via Huggingface Dataset? 🤗Datasets	1	257	January 18, 2024
Dataset Policy, Terms of Use 🤗Datasets	1	212	May 23, 2024
Reddit data - GDPR 🤗Datasets	0	566	October 13, 2020
Is it possible to upload 4tb+ open source dataset? 🤗Datasets	1	266	February 27, 2023
Can we upload datasets with a total size like the pile? 🤗Datasets	2	644	December 10, 2021

Large Open Dataset Allowed?

Related topics