I’d like to host a dataset on Hugging Face – The AI community building the future.
Is there a size limit? My dataset is 106GB.
I tried to add it but I can’t even add the files to the git repo (I’m getting
fatal: Out of memory, realloc failed after
git add), although I’m using
Hi, @PaulLerner! Thank you for the question.
git-lfs has a size limitation of 2GB – and only a few GB larger, if you’re using GitHub Enterprise. As an alternative, you can store your dataset in an alternate location (ex: cloud storage), and reference that location in your data loading script.
Hi @PaulLerner and @dynamicwebpaige!
As far as I know, we do have datasets with some Terabytes. As Paige suggested, you can store your dataset in alternate locations, but it is also possible (as far as I know) to upload datasets above 5GB with
huggingface-cli lfs-enable-largefiles .
This is similar to the solution in Uploading files larger than 5GB to model hub.
I hope this helps!
Yes unlike GitHub () we do not have a limitation on LFS-stored file sizes on the HuggingFace hub.
However for ease of use and download speeds, we do advise users to chunk their large files into blobs of at most 20GB (or even 5GB if possible)
Thanks for your answers!
huggingface-cli lfs-enable-largefiles . seems to do the job