Pipeline and Hosted Inference API unable to load private LM-boosted ASR model

Hello,
I have uploaded a private LM-boosted ASR model onto the hub. However, the Hosted Inference engine is not able to successfully load it.


So, I have created a Gradio app in a private space to access the model. I have created a token to access the model and defined a repo secret in my Gradio space.

However, the Pipeline is not able to load the model successfully even when passing a token to it. Here is some code snippets:

token_key = os.environ.get(“HUGGING_FACE_HUB_TOKEN”)
p = pipeline(“automatic-speech-recognition”, model=model_name, use_auth_token=token_key)

The log shows that model loading is failing on accessing alphabet.json file, as if the token were not being applied to the LM files. The model and tokenizer files are successfully loaded.

Here is a snapshot of the log and the last part of the trace:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 131, in _raise_for_status
    response.raise_for_status()
  File "/home/user/.local/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/unilux/wav2vec-xlsr-300m-Luxembourgish-with-LM/resolve/2586619a585002ca896cea9e92d2d0de15540206/alphabet.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "app.py", line 12, in <module>
    p = pipeline("automatic-speech-recognition", model=model_name, use_auth_token=True)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 818, in pipeline
    decoder = BeamSearchDecoderCTC.load_from_hf_hub(model_name, allow_regex=allow_regex)
  File "/home/user/.local/lib/python3.8/site-packages/pyctcdecode/decoder.py", line 827, in load_from_hf_hub
    cached_directory = snapshot_download(  # pylint: disable=not-callable
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_deprecation.py", line 93, in inner_f
    return f(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 192, in snapshot_download
    _ = hf_hub_download(
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1099, in hf_hub_download
    _raise_for_status(r)
  File "/home/user/.local/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 169, in _raise_for_status
    raise e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error: Repository Not Found for url: https://huggingface.co/unilux/wav2vec-xlsr-300m-Luxembourgish-with-LM/resolve/2586619a585002ca896cea9e92d2d0de15540206/alphabet.json. If the repo is private, make sure you are authenticated. (Request ID: d4rxchN6cnQxRihE8r6q2)

I don’t know if this is a bug but the issue only occurs with LM-boosted private models. Other private models work fine with both the Inference engine and the gradio space.

Am I missing any configuration? Please advise?

Thanks in advance!