Large Open Dataset Allowed?

Would a large, open source dataset where users can upload scraped and validated data (with more fine grained permissions wrt who can read and write data) comply with HuggingFace policies and terms of service? Data scraped by the users would be aggregated onto the large main dataset individually. Data would include text scraped from Reddit, Twitter/X, etc. and would be used to train models.