Serverless inference issues for a new Go library

maruel · March 16, 2025, 5:40pm

I’m writing a new library in Go using the serverless inference API and I hit a few problems:

The documentation at Chat Completion is very focused on the python library, and doesn’t list much for the REST API. to the point that the URL format to use isn’t even listed. I use "https://router.huggingface.co/hf-inference/models/" + model + "/v1/chat/completions". I do not need OpenAI compatibility, whatever is closest to native implementation is better for me.
When I make a mistake, I get a whole HTML page with <h1>503</h1> instead of an error message in JSON. That’s really hurting my progress. It seems there’s a reverse proxxy on the router that is eating the error messages.
I failed to create a test example that works with JSON schema for structured reply. What example (in any language) would you point me to? I see that Célina and Lucain recently updated the test case test_chat_completion_with_response_format() and it’s now skipped. huggingface_hub/tests/test_inference_client.py at main · huggingface/huggingface_hub · GitHub

John6666 · March 17, 2025, 5:26am

First of all, the Serverless Inference API is currently being completely overhauled, so if you have any questions about the broad changes that will be made in the future, it would be better to ask them on the github issues page.

Library issue

Non-library issue

documentation

There is some.

I get a whole HTML page with <h1>503</h1> instead of an error message in JSON

Same here…

maruel · March 17, 2025, 2:51pm

Thanks, that was super useful!

Looks like it’s half-cooked:

Incompatibility between OpenAI and HF's Chat Completion `response_format` · Issue #932 · huggingface/huggingface.js · GitHub
Support `reponse_format: {"type": "json_object"}` without any constrained schema · Issue #2899 · huggingface/text-generation-inference · GitHub
response_format with regex does not seem to work · Issue #2423 · huggingface/huggingface_hub · GitHub (about regex but useful to know)

I’m waiting for google/gemma-3-4b-it to be properly supported on serverless inference so I can test it out more coupled with vision.

John6666 · March 17, 2025, 4:47pm

As for Gemma 3, we just have to be patient until this fork is merged into main. It probably won’t take that long.

system · March 18, 2025, 4:47am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference Endpoints - No working code examples Inference Endpoints on the Hub	3	154	January 29, 2025
Request to Serverless Inference API failed with 400 status code Inference Endpoints on the Hub	2	236	March 4, 2025
Serverless Inference API error on new model Inference Endpoints on the Hub	5	347	September 9, 2024
504 error with serverless HF Inference API Inference Endpoints on the Hub	1	37	March 17, 2025
Just don't get it: OpenAI API in Open WebUI Beginners	2	48	June 12, 2025

Serverless inference issues for a new Go library

Library issue

Non-library issue

Related topics