How to get a list of all Huggingface download redirections to whitelist?

I work inside a secure corporate VPN network, so I’m unable to download Huggingface models using from_pretrained commands. However, I can request the security team to whitelist certain URLs needed for my use-case.

The security team has already whitelisted the ‘huggingface.co’ and ‘cdn-lfs.huggingface.co’ URLs. I can now download the files from repo but the loading functions from_pretrained still don’t work.

I think it’s getting blocked while redirecting the requests internally. So, is there a way to know all (hop) URLs I can request to whitelist to make the load functions work?

Thanks in advance.

9 Likes

hi @ayadav

Can you give more details, like error logs, etc?

Is there any update on this?

1 Like

Having the same issue. Is there a listing of URLs that we can whitelist? Also if there are any planned changes to URLs is there a roadmap so we can stay on top of it?

1 Like

I’ll try to supply error logs next time I encounter it, but it has come up multiple times for me as well. When we try to call <model>.from_pretrained("repo") in our DataBricks environment, we get an SSL error about not having the proper certificate. We’ve also gotten a max_retries error but I can’t say for certain if that was due to the underlying whitelist request. There are ways around this, but if HF published a domain list that we could use to properly configure our environments, that would be very useful!

2 Likes

hi! any updates on this? or any alternatives to follow meanwhile? I am about to try downloading a model and going offline and then pushing it up to databricks. Yet, if you had a better idea, or tried this before, I’d like to hear.

I have same issue with download from different cdn name.
After our IT team added
http://huggingface.co/ and
http://cdn-lfs.huggingface.co/ in whitelist.

For example, it is work for download meta-llama/Llama-2-13b-chat.
But error when the cdn become cdn-lfs-us-1.huggingface.co or other regions.

Update? Same issue here. I’ve gotten around by using my home network to connect to the hf repo and download to my workstation cache. Then I reconnect to VPN into the corporate network and copy from my workstation to the server cache. This is painfully slow.

FWIW curl -IL test shows redirection (302 responses) from the repo when I am connected to the corporate network (fails to download). However on my home network there are no redirects (successful download). Is there an issue with general redirection handling?

Hey was anyone able to find a solution for this?

Related:

Note that for security reasons, we recently updated the domain for our CDN; in order to be able to download files you also need to whitelist the following domains:

3 Likes

we have created exception for SSL inspection for FQDN listed by pierric plus these 2 ones:

But it is still does not work, always same error encountered SSL: CERTIFICATE_VERIFY_FAILED when trying to download sentence-transformers/all-MiniLM-L6-v2

1 Like