Uploading files larger than 5GB to model hub

I want to upload ctrl to model hub. I have followed the instructions from the documentation and it seems that they are applicable for smaller models (<5GB). Issues have been raised here and here but it still seems unresolved.

I followed the one shared by @julien-c (aws s3 cp …) but got the error Unable to locate credentials indicating that they are for HF staff.

I am getting the error could not push some refs as 5GB limit is crossed. Can you please suggest a workaround if that exists ?

@patrickvonplaten

Did you install git-lfs? git lfs install

You also have to install our custom transfer agent for files >5GB: transformers-cli lfs-enable-largefiles

Let me know if this helps!

Thanks for replying. I have git lfs and I tried custom transfer agent earlier. I tried it again just now, still gives me:

EOFoading LFS objects:  50% (1/2), 5.0 GB | 0 B/s                                                                                                                                                            
error: failed to push some refs to 'https://huggingface.co/prajjwal1/ctrl_discovery_1'

I manged to do it on the server. For some reason, it won’t work on local machine.

are you able to upload files to S3 from your local machine? Might be a proxy-related issue.

If you want more info, you can try GIT_TRACE=1 GIT_CURL_VERBOSE=1 git push

I have loaded many models with earlier version of transformers which used to not rely on lfs. I’m not sure whats the issue on local machine, but its working on the server. May use the flag if I would need a trace.

@prajjwal1 git-lfs does its own networking (upload) calls with libcurl so the behavior when behind a proxy might be different from python-requests.

Let me know if you get more info in the future, this is interesting!

Hello @julien-c ,

I am facing similar issues, attempting to upload a large file (just above 5GB) using git-lfs. I have a dual boot on my machine, and unfortunately did not manage to upload neither from Windows or Ubuntu 18.04.

I have git-lfs installed with the environment variable GIT_LFS_SKIP_SMUDGE=1 and ran

(transformers) λ transformers-cli lfs-enable-largefiles .
Local repo set up for largefiles

Interestingly, both the Windows and Ubuntu version return the following warning after adding the file:

E:\path\to\gpt-neo-1.3B (main -> origin)
(transformers) λ git add rust_model.ot
Encountered 1 file(s) that may not have been copied correctly on Windows:
        rust_model.ot

See: `git lfs help smudge` for more details.

Attempting to push returns a similar warning:

(transformers) λ git push
EOFoading LFS objects: 100% (1/1), 5.3 GB | 0 B/s
error: failed to push some refs to 'https://huggingface.co/EleutherAI/gpt-neo-1.3B'

I read that git-lfs has issues for files larger than 4GB on windows - but I am surprised to face issues on my Ubuntu partition. Am I missing something?

1 Like

Just chiming in to say I’m getting the same errors on Arch. Git LFS is installed and enabled, adding a 5GB model gives the same warning about Windows, and then trying push results in the same failed to push some refs error.

EDIT: Trying again a few hours later and it worked, guess it was just a blip.

1 Like

cc’ing @pierric on this.

If you’re not on Windows, the Windows-related warning should be innocuous (I get them too)

@guillaume-be can you try again in case it was a temporary hiccup?

Thank you for the quick feedback!

I just tried again from Ubuntu and got the same error message:

(transformers) guillaume@guillaume-MS-7B78:~/gpt-neo-1.3B$ transformers-cli lfs-enable-largefiles .
Local repo set up for largefiles
(transformers) guillaume@guillaume-MS-7B78:~/gpt-neo-1.3B$ git lfs track rust_model.ot
Tracking "rust_model.ot"
(transformers) guillaume@guillaume-MS-7B78:~/gpt-neo-1.3B$ git add rust_model.ot 
Encountered 1 file(s) that may not have been copied correctly on Windows:
	rust_model.ot
See: `git lfs help smudge` for more details.
(transformers) guillaume@guillaume-MS-7B78:~/gpt-neo-1.3B$ git commit -m "Addition of Rust model"
[main 094438a] Addition of Rust model
 1 file changed, 3 insertions(+)
 create mode 100755 rust_model.ot
((transformers) guillaume@guillaume-MS-7B78:~/gpt-neo-1.3B$ git push
Username for 'https://huggingface.co': guillaume-be
Password for 'https://guillaume-be@huggingface.co': 
Username for 'https://huggingface.co': guillaume-be                                                    
Password for 'https://guillaume-be@huggingface.co': 
EOFoading LFS objects:   0% (0/1), 5.0 GB | 0 B/s                                                      
error: failed to push some refs to 'https://huggingface.co/EleutherAI/gpt-neo-1.3B'

I tried with and without the step git lfs track rust_model.ot, getting the same error both times. Unfortunately troubleshooting is time-consuming given the upload takes ~1 hour.

edit: I just tried the upload again with GIT_TRACE=1 GIT_CURL_VERBOSE=1. Coincidentally, I am trying to upload ~5.3GB which should take just over an hour to complete. The upload crashes towards the end of the upload - but I believe this could very well be due to a timeout. Looks like the git worker below sets a "expires_in": 3600 that exactly matches 1 hour.

Is there a setting on server side (or something I should change on my side) to allow for uploads taking longer than an hour?

18:06:53.358398 trace git-lfs: xfer: Custom adapter worker 0 sending message: 
{
  "event": "upload",
  "oid": "[...]",
  "size": 5312743379,
  "path": "/home/guillaume/gpt-neo-1.3B/.git/lfs/objects/cc/d3/[...]",
  "action": {
    "href": "https://huggingface.co/EleutherAI/gpt-neo-1.3B.git/info/lfs/objects/complete_multipart?uploadId=[...]",
    "header": {
      "00001": "h[...]",
      "00002": "[...]",
      "chunk_size": "5000000000"
    },
    "expires_at": "0001-01-01T00:00:00Z",
    "expires_in": 3600
  }
}

Hi @guillaume-be, thank you for the debugging.
The problem you face is indeed probably related to the expiration time defined for multipart uploads (1 hour), which you are exceeding.

I just changed it to 3 hours, let me know how it goes!

1 Like

@pierric works like a charm thank you!
I am probably going to attempt an upload of Rust weights for GPT-Neo 2.7B tomorrow, hopefully I don’t exceed the 3 hours upload time :grinning:

1 Like

@guillaume-be If your upload speed is too slow a workaround might be to convert/upload from a server.

Absolutely, this will definitely be the way to go for large models!

I’m facing the same problem here can you tell me how can I change the expiration time, please?

Hello,

We just deployed a 24h expiration time for uploads.

Hope this will fix the slow upload bandwidth issue.

Regards.