Serverless inference issues for a new Go library

I’m writing a new library in Go using the serverless inference API and I hit a few problems:

  • The documentation at Chat Completion is very focused on the python library, and doesn’t list much for the REST API. to the point that the URL format to use isn’t even listed. I use "https://router.huggingface.co/hf-inference/models/" + model + "/v1/chat/completions". I do not need OpenAI compatibility, whatever is closest to native implementation is better for me.
  • When I make a mistake, I get a whole HTML page with <h1>503</h1> instead of an error message in JSON. That’s really hurting my progress. It seems there’s a reverse proxxy on the router that is eating the error messages.
  • I failed to create a test example that works with JSON schema for structured reply. What example (in any language) would you point me to? I see that Célina and Lucain recently updated the test case test_chat_completion_with_response_format() and it’s now skipped. huggingface_hub/tests/test_inference_client.py at main · huggingface/huggingface_hub · GitHub
1 Like

First of all, the Serverless Inference API is currently being completely overhauled, so if you have any questions about the broad changes that will be made in the future, it would be better to ask them on the github issues page.

Library issue

Non-library issue

documentation

There is some.

I get a whole HTML page with <h1>503</h1> instead of an error message in JSON

Same here…:sob:

1 Like

Thanks, that was super useful!

Looks like it’s half-cooked:

I’m waiting for google/gemma-3-4b-it to be properly supported on serverless inference so I can test it out more coupled with vision.

1 Like

As for Gemma 3, we just have to be patient until this fork is merged into main. It probably won’t take that long.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.