It’s probably going to be over 500TB…
If you’re going to upload more than 300GB of data to Hugging Face in a single repository, it’s better to consult with HF in advance by email. website@huggingface.co
Also, if you’re using a large dataset for training with Hugging Face’s library or torch, it seems that sharding the dataset will make it run more stably. @lhoestq