Inference Endpoints - No working code examples

mjd300 · January 3, 2025, 12:31pm

Hello all,

Trying to connect with javascript to an inference endpoint. The endpoint set up and running.

To clarify, I am refering to Inference Endpoints (dedicated), not Serverless API. It seems both in the docs and this forum people refer to Serverless API as Inference ENdpoints (or maybe it is just me).

The issue I am having is that every piece of code on Hugging Face does not work, except for the OpenAI API example (and that’s not an option for current project).

The code given on the Inference Endpoint page, which oddly doesn’t use the HF javascript library, does not connect o the endpoint correctly (404 error, no matter what variation of endpoint I try).

In the docs, this page: has this code:

const inference = new HfInference(‘hf_…’) // your user token

const gpt2 = inference.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2-endpoint')
const { generated_text } = await gpt2.textGeneration({ inputs: 'The answer to the universe is' })

Other than being odd ( ‘gpt2’ as a variable name and the example endpoint ends in ‘gpt2-endpoint’, which makes it less clear), it doesn’t work. It looks like someone’s copied this from the Serverless API example and not tested it.

Most of the errors are 404, which normally I’d assume was the endpoint itself. But I’ve tried all variations, and the same endpoint works fine if I use the OpenAI API example.

So right now, I’m paying for an endpoint (small model, so not a lot to be fair), and the only way to access it is via the OpenAI API.

I’m from the open source world, going back years. I appreciate I can rant, but I’d be much more useful if I helped fix it.

So, if anyone can give me an example of working javascript code that connects to an Inference Endpoint (chatbot is the focus), I’d love to solve my problem and also submit a much-needed update to the docs.

CylinderS · January 28, 2025, 12:48am

I’m having this same problem. Anyone solve this yet?

CylinderS · January 28, 2025, 3:45am

Well, I just realized in my case I’m running a GGUF in a Llama.cpp container, and maybe the OpenAI endpoint is the only one available in that container? The docs say this:

" You can deploy any llama.cpp compatible GGUF on the Hugging Face Endpoints. When you create an endpoint with a GGUF model, a llama.cpp container is automatically selected using the latest image built from the master branch of the llama.cpp repository. Upon successful deployment, a server with an OpenAI-compatible endpoint becomes available."

neopolita · January 29, 2025, 10:52am

You might want to use the /completion endpoint instead:

github.com/ggerganov/llama.cpp

examples/server/README.md

master

# LLaMA.cpp HTTP Server

Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/yhirose/cpp-httplib), [nlohmann::json](https://github.com/nlohmann/json) and **llama.cpp**.

Set of LLM REST APIs and a simple web front end to interact with llama.cpp.

**Features:**
 * LLM inference of F16 and quantized models on GPU and CPU
 * [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
 * Reranking endoint (WIP: https://github.com/ggerganov/llama.cpp/pull/9510)
 * Parallel decoding with multi-user support
 * Continuous batching
 * Multimodal (wip)
 * Monitoring endpoints
 * Schema-constrained JSON response format

The project is under active development, and we are [looking for feedback and contributors](https://github.com/ggerganov/llama.cpp/issues/4216).

## Usage

This file has been truncated. show original

Topic		Replies	Views
Serverless inference issues for a new Go library Inference Endpoints on the Hub	4	31	March 18, 2025
Serverless Inference API error on new model Inference Endpoints on the Hub	5	346	September 9, 2024
HF Inference Endpoints Error 429 Inference Endpoints on the Hub	2	70	March 27, 2025
Request to Serverless Inference API failed with 400 status code Inference Endpoints on the Hub	2	229	March 4, 2025
Example Inference API (model & code ), pls Beginners	5	30	June 28, 2025

Inference Endpoints - No working code examples

Related topics