HF Inference API: 503/504 Server Error

Hello,
I am trying to run inference as shown in the code snippet below, but always get a 504 or 503 server error. I’ve tried different models and different ways of calling the Inference API, but always run into the same problem. However, it appears that this problem is specific to my HF account since I’ve checked that others have been able to run the exact same code without issues. Does anyone know what could be going wrong?

CODE SNIPPET

from huggingface_hub import InferenceClient


# Initialize Hugging Face InferenceClient
client = InferenceClient(
   model="facebook/opt-1.3b",
   token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
)


result = client.text_generation(
   prompt="Hello you are a chatbot, answer this ",
   model="facebook/opt-1.3b",
)
result

ERROR

HTTPError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_http.py in hf_raise_for_status(response, endpoint_name)
408 try:
→ 409 response.raise_for_status()
410 except HTTPError as e:

6 frames
HTTPError: 504 Server Error: Gateway Time-out for url: https://router.huggingface.co/hf-inference/models/facebook/opt-1.3b

The above exception was the direct cause of the following exception:

HfHubHTTPError Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_http.py in hf_raise_for_status(response, endpoint_name)
479 # Convert HTTPError into a HfHubHTTPError to display request information
480 # as well (request id and/or server error message)
→ 481 raise _format(HfHubHTTPError, str(e), response) from e
482
483

HfHubHTTPError: 504 Server Error: Gateway Time-out for url: https://router.huggingface.co/hf-inference/models/facebook/opt-1.3b

1 Like

The API seems to be in a bad state at the moment.

#model_id = "facebook/opt-1.3b" # No response for a long time...
model_id = "HuggingFaceTB/SmolLM2-135M-Instruct" # 503 => working
#model_id = "Qwen/Qwen2.5-3B-Instruct" # 503 => no response for a long time...

HF_TOKEN = "hf_my_pro_token***"

# Initialize Hugging Face InferenceClient
client = InferenceClient(
   model=model_id,
   token=HF_TOKEN,
   provider="hf-inference",
   timeout=600,
)

result = client.text_generation(
   prompt="Hello you are a chatbot, answer this ",
   model=model_id,
)

print(result)