Code trying to download model from huggingface instead of using Locally Downloaded Model

Hi all,

When I am using locally downloaded nvidia/NV-Embed-v1 model in the local workstation, it is loading and running fine. But when same model and script is used in another server, the code is trying to download the model instead of using local model.

Model downloaded from HuggingFace (both in workstation and server) as:

git lfs install
git clone https://huggingface.co/nvidia/NV-Embed-v1
(# When prompted for a password, used an access token with write permissions.)

Basic Code Snippet (Same in both in workstation and server)

from sentence_transformers import SentenceTransformer
# Load the local NV-Embed-v1 model using sentence-transformers with trust_remote_code
model_path = "/home/pc1/path/to/Download/NV-Embed-v1"
model = SentenceTransformer(model_path, device='cpu', trust_remote_code=True)

For the given code snippet, model is working fine in workstation but giving below error in server:

/home/path/to/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/path/to/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/home/path/to/lib/python3.8/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/nvidia/NV-Embed-v1/resolve/main/config.json

As per my understanding, code should have used local copy of config.json instead of accessing https://huggingface.co/nvidia/NV-Embed-v1/resolve/main/config.json.

Both local workstation and server has same version of:
sentence-transformers 2.7.0
huggingface-hub==0.23.0

Workstation: Python 3.10.12, Ubuntu 22.04.4 LTS
Server1: Python 3.8.10, Ubuntu 20.04.5 LTS

I tried in another server 2, with Python 3.10.12, but same issue there also.

What is going wrong in server case? Any idea/help pls.

Thanks.

NVIDIA-EMBED-1 is made for NeMO, a cloud disturbed training framework.

This is the new 1st model of the MTEB Leaderboard

@GPT007 Salesforce/SFR-Embedding-2_R might be a new top model on MTEB leaderboard, but our issue is different.

In our case, in our main server, code is not using locally downloaded gated model, rather making a call to huggingface, whereas, the same code and model is running perfectly fine in the local workstation. Isn’t that weird?

I tried everything in main server, from setting up different docker to imitate the workstation os, env and packages, making ensure that both workstation and server has exactly same python versions, pip packages, and env variables. Don’t know where the actual problem is ?

PS: FYKI, I also gave a try to this new Salesforce/SFR-Embedding-2_R model. In our case, it is giving a lot of false positive results compared to nvidia/NV-Embed-v1 model.

Ok, further forum search helped.

Placing my token in the file:

~/.cache/huggingface/token 

worked for me.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.