Is there a size limit for dataset hosting

Hey,

I’d like to host a dataset on Hugging Face – The AI community building the future.
Is there a size limit? My dataset is 106GB.
I tried to add it but I can’t even add the files to the git repo (I’m getting fatal: Out of memory, realloc failed after git add), although I’m using git-lfs

Bests,

1 Like

Hi, @PaulLerner! Thank you for the question. :slight_smile:

Unfortunately, git-lfs has a size limitation of 2GB – and only a few GB larger, if you’re using GitHub Enterprise. As an alternative, you can store your dataset in an alternate location (ex: cloud storage), and reference that location in your data loading script.

1 Like

Hi @PaulLerner and @dynamicwebpaige!

As far as I know, we do have datasets with some Terabytes. As Paige suggested, you can store your dataset in alternate locations, but it is also possible (as far as I know) to upload datasets above 5GB with huggingface-cli lfs-enable-largefiles .

This is similar to the solution in Uploading files larger than 5GB to model hub.

I hope this helps! :rocket: :full_moon:

2 Likes

Yes unlike GitHub (:innocent::wink:) we do not have a limitation on LFS-stored file sizes on the HuggingFace hub.

However for ease of use and download speeds, we do advise users to chunk their large files into blobs of at most 20GB (or even 5GB if possible)

Hi everyone,

Thanks for your answers! huggingface-cli lfs-enable-largefiles . seems to do the job :slight_smile:

1 Like