Intermittent 504 Gateway Timeout on Inference API for mixedbread-ai/mxbai-embed-large-v1

Summary

Intermittent 504 errors from Hugging Face Inference API when generating embeddings with model mixedbread-ai/mxbai-embed-large-v1. Same request sometimes succeeds and sometimes fails within seconds.

Product

Hugging Face Inference API

Model

mixedbread-ai/mxbai-embed-large-v1

Impact

Embedding generation in a backend script is unreliable. Retries help only sporadically.

Environment

  • Client: Node.js script calling HF Inference API

  • OS: macOS on developer machine

  • Auth: HF API key in Authorization header

  • Payload: short English sentence ("This is a test sentence for embedding generation.”)

Minimal repro

  1. Use the HF Inference API for embeddings with the model above.

  2. Send the multiple requests in sequence

  3. Observe alternating success and 504 responses.

Expected

Consistent 200 with embedding vector of length 1024.

Actual

Roughly alternating success and failure. Failures return 504 with an HTML body labeled “Hugging Face - The AI community building the future.” and title “504 Gateway Timeout.”

1 Like

Earlier, it seemed like there was an error with the API, so maybe this is it…?
Currently, no errors seem to be occurring in my environment.

import os
from huggingface_hub import InferenceClient

HF_TOKEN = "hf_***my_read_token***"

client = InferenceClient(
    provider="hf-inference",
    api_key=HF_TOKEN,
)

def infer():
    result = client.feature_extraction(
        "This is a test sentence for embedding generation.",
        model="mixedbread-ai/mxbai-embed-large-v1",
    )
    print(result)

for i in range(5):
    infer() # [ 0.10855082  0.2237774   0.06413455 ...  0.16363798  0.16282862  -0.40164134]