Request to Serverless Inference API failed with 400 status code

LinYouyIFAD · March 4, 2025, 8:29am

I’m consistently encountering issues with the HuggingFace Serverless API, specifically with the hf-inference provider. When running the sample code on the HuggingFace serverless deployment page, I receive the following error:

openai.BadRequestError: Error code: 400 - {'error': 'Not allowed to request v1/chat/completions for provider hf-inference'}

I tried following models, all with the same status code:

meta-llama/Llama-3.3-70B-Instruct
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
deepseek-ai/DeepSeek-R1-Distill-Llama-70B

Does anyone have any ideas or suggestions on how to resolve this issue?

John6666 · March 4, 2025, 8:56am

Same here, some posts on Hub, and some posts on HF Discord.

JenniferCZuber · March 4, 2025, 9:34am

It looks like you’re encountering a 400 Bad Request error when trying to access the Serverless Inference API. This usually happens due to incorrect input, missing parameters, or issues with the API request format. I’d recommend double-checking the request payload, headers, and any required parameters. Also, ensure the API endpoint you’re hitting is correct and the authorization tokens, if any, are valid. Let me know if you’d like help troubleshooting further! Read more

Cyfutureai · July 28, 2025, 9:58am

LinYouyIFAD:

I’m consistently encountering issues with the HuggingFace Serverless API, specifically with the hf-inference provider. When running the sample code on the HuggingFace serverless deployment page, I receive the following error:
openai.BadRequestError: Error code: 400 - {'error': 'Not allowed to request v1/chat/completions for provider hf-inference'}
I tried following models, all with the same status code:

meta-llama/Llama-3.3-70B-Instruct

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

Does anyone have any ideas or suggestions on how to resolve this issue?

You’re not alone—this seems to be a common issue when using the HuggingFace Serverless API via the hf-inference provider. The error:

javascript

CopyEdit

openai.BadRequestError: Error code: 400 - {'error': 'Not allowed to request v1/chat/completions for provider hf-inference'}

usually indicates a mismatch between the API endpoint you’re hitting (v1/chat/completions, which is typically used with OpenAI-compatible providers) and the capabilities or expected format for the hf-inference provider.

The hf-inference provider doesn’t support the OpenAI-compatible chat endpoint (v1/chat/completions) directly. Instead, it expects payloads specific to Hugging Face’s inference API. You’ll need to adapt your code to match Hugging Face’s format or use an OpenAI-compatible provider that supports the /v1/chat/completions endpoint properly.

If you’re looking for a smooth, OpenAI-compatible experience with models like LLaMA or DeepSeek, you might want to try Cyfuture AI. They offer OpenAI-compatible APIs for several high-performing open-source models (like LLaMA 3 and DeepSeek) and can be a great drop-in alternative if you’re running into provider limitations with Hugging Face.

Let me know if you need help modifying your code for Hugging Face or switching providers—happy to share examples.

Topic		Replies	Views
Serverless Inference API error on new model Inference Endpoints on the Hub	5	367	September 9, 2024
504 error with serverless HF Inference API Inference Endpoints on the Hub	1	45	March 17, 2025
Serverless Inference API [error 500] Inference Endpoints on the Hub	2	69	January 23, 2025
Request failed: 500 Beginners	7	548	March 4, 2025
Inference API Issues Beginners	0	2722	September 9, 2021

Request to Serverless Inference API failed with 400 status code

Related topics