Intermittent 504 Gateway Timeout on Inference API for mixedbread-ai/mxbai-embed-large-v1

hnpfhnpf · September 4, 2025, 9:01am

Summary

Intermittent 504 errors from Hugging Face Inference API when generating embeddings with model mixedbread-ai/mxbai-embed-large-v1. Same request sometimes succeeds and sometimes fails within seconds.

Product

Hugging Face Inference API

Model

mixedbread-ai/mxbai-embed-large-v1

Impact

Embedding generation in a backend script is unreliable. Retries help only sporadically.

Environment

Client: Node.js script calling HF Inference API
OS: macOS on developer machine
Auth: HF API key in Authorization header
Payload: short English sentence ("This is a test sentence for embedding generation.”)

Minimal repro

Use the HF Inference API for embeddings with the model above.
Send the multiple requests in sequence
Observe alternating success and 504 responses.

Expected

Consistent 200 with embedding vector of length 1024.

Actual

Roughly alternating success and failure. Failures return 504 with an HTML body labeled “Hugging Face - The AI community building the future.” and title “504 Gateway Timeout.”

John6666 · September 4, 2025, 9:44am

Earlier, it seemed like there was an error with the API, so maybe this is it…?
Currently, no errors seem to be occurring in my environment.

import os
from huggingface_hub import InferenceClient

HF_TOKEN = "hf_***my_read_token***"

client = InferenceClient(
    provider="hf-inference",
    api_key=HF_TOKEN,
)

def infer():
    result = client.feature_extraction(
        "This is a test sentence for embedding generation.",
        model="mixedbread-ai/mxbai-embed-large-v1",
    )
    print(result)

for i in range(5):
    infer() # [ 0.10855082  0.2237774   0.06413455 ...  0.16363798  0.16282862  -0.40164134]

Topic		Replies	Views
HF Inference API: 503/504 Server Error Inference Endpoints on the Hub	4	345	September 5, 2025
504 error with serverless HF Inference API Inference Endpoints on the Hub	1	45	March 17, 2025
504 Gateway Time-out in Inference Endpoints Inference Endpoints on the Hub	3	735	January 23, 2025
Internal server error when making multiple POST requests to HuggingFace API endpoint for embedding model sentence-transformers/all-MiniLM-L6-v2 Models	0	872	July 19, 2023
Inference API model timeout (Flan-UL2) Inference Endpoints on the Hub	1	893	May 26, 2023

Intermittent 504 Gateway Timeout on Inference API for mixedbread-ai/mxbai-embed-large-v1

Related topics