Max individual file size for LFS files is 46.6GB

jacobbieker · May 14, 2022, 11:21am

Hi,

There seems to be a new limit for datasets, and I was just wondering if this is expected behavior. I’ve been pushing yearly zipped-Zarr stores for US precipitation radar data to openclimatefix/mrms · Datasets at Hugging Face successfully, each one being around 100-130GB each. I just tried to push an updated and fixed one for 2018, and am now having a new error that the max size is 46.6GB? I can split the Zarr stores into smaller ones, but it is simpler and easier to have a single large Zarr store that is read once.

(dgmr) [jacob@ocf mrms]$ git push
batch response: jects:   0% (0/1), 0 B | 0 B/s                                                                                                                                                                                               
You need to configure your repository to enable upload of files > 5GB.
Run "huggingface-cli lfs-enable-largefiles ./path/to/your/repo" and try again.

error: failed to push some refs to 'https://huggingface.co/datasets/openclimatefix/mrms'
(dgmr) [jacob@ocf mrms]$ huggingface-cli lfs-enable-largefiles .
Local repo set up for largefiles
(dgmr) [jacob@ocf mrms]$ git push
[0f7bef0d818fe9c05f7c821bf4b66f9218a4bae1a1ad2ae6274288f687704f28] Max individual file size for LFS files: 46.6GB: [422] Max individual file size for LFS files: 46.6GB                                                                      
error: failed to push some refs to 'https://huggingface.co/datasets/openclimatefix/mrms'
(dgmr) [jacob@ocf mrms]$ git push
[0f7bef0d818fe9c05f7c821bf4b66f9218a4bae1a1ad2ae6274288f687704f28] Max individual file size for LFS files: 46.6GB: [422] Max individual file size for LFS files: 46.6GB                                                                      
Uploading LFS objects: 100% (1/1), 125 GB | 0 B/s, done.
error: failed to push some refs to 'https://huggingface.co/datasets/openclimatefix/mrms'

lhoestq · May 17, 2022, 8:31pm

Hi ! I’d suggest to split your files into smaller ones.

It is simpler for many systems to handle files that are around 1-2GB each. It helps to parallelize data transfer and data processing without having memory issues.

jacobbieker · May 19, 2022, 5:06pm

Okay yeah, I’ll split them into smaller ones then, was hoping to just keep the large files as xarray and zarr are built for accessing large-than-memory datasets lazily and work a bit more efficiently if they don’t have to read multiple files’ metadata. But that is just a small thing, so I’ll work on keeping the files a bit smaller then. Thanks!

Topic		Replies	Views
Is there a size limit for dataset hosting 🤗Datasets	11	14183	August 24, 2023
UnexpectedError LFS Storage Used on the dataset has suddenly gone to -55034619833 Bytes 🤗Datasets	2	32	March 10, 2025
Error while uploading files larger than 10Mb 🤗Datasets	1	893	October 10, 2022
Lfs Storage cap Spaces	1	37	May 15, 2025
Request for Additional Storage Space for Dataset Repository 🤗Datasets	3	110	October 11, 2024

Max individual file size for LFS files is 46.6GB

Related topics