Is there a size limit for dataset hosting

Hey,

I’d like to host a dataset on Hugging Face – The AI community building the future.
Is there a size limit? My dataset is 106GB.
I tried to add it but I can’t even add the files to the git repo (I’m getting fatal: Out of memory, realloc failed after git add), although I’m using git-lfs

Bests,

2 Likes

Hi, @PaulLerner! Thank you for the question. :slight_smile:

Unfortunately, git-lfs has a size limitation of 2GB – and only a few GB larger, if you’re using GitHub Enterprise. As an alternative, you can store your dataset in an alternate location (ex: cloud storage), and reference that location in your data loading script.

2 Likes

Hi @PaulLerner and @dynamicwebpaige!

As far as I know, we do have datasets with some Terabytes. As Paige suggested, you can store your dataset in alternate locations, but it is also possible (as far as I know) to upload datasets above 5GB with huggingface-cli lfs-enable-largefiles .

This is similar to the solution in Uploading files larger than 5GB to model hub.

I hope this helps! :rocket: :full_moon:

2 Likes

Yes unlike GitHub (:innocent::wink:) we do not have a limitation on LFS-stored file sizes on the HuggingFace hub.

However for ease of use and download speeds, we do advise users to chunk their large files into blobs of at most 20GB (or even 5GB if possible)

3 Likes

Hi everyone,

Thanks for your answers! huggingface-cli lfs-enable-largefiles . seems to do the job :slight_smile:

1 Like

No limit for individual files. Ok wow that sounds great!!! Does this mean that there is also no limit for all files combined?

Best,
Chris

1 Like

I would also be interested in this question. I cannot find a clear limit for data hosting on the website.

As said above (Is there a size limit for dataset hosting - #4 by julien-c), there is no fixed limit

@julien-c I’m also confused. So I may upload as many files as I want and have them be <20GB or <5GB but I basically upload a data set as large as I want? @julien-c seems amazing…

More details here: Upload files to the Hub

The summary table:

Characteristic Recommended Tips
Repo size - contact us for large repos (TBs of data)
Files per repo <100k merge data into fewer files
Entries per folder <10k use subdirectories in repo
File size <5GB split data into chunked files
Commit size <100 files* upload files in multiple commits
Commits per repo - upload multiple files per commit

@julien-c The limit as of now is 50 GB for each LFS file on upload. I did not see this in the documentation.

that’s correct. I think we should add it somewhere @severo

1 Like