[RESOLVED] Recommended way to use guidance on an Inference Endpoint?

moocow2txt · March 26, 2024, 12:35pm

Greetings all,

I would like to create an Inference Endpoint using the “guidance” features of the text-generation-inference container. According to Guidance , those features were introduced in text-generation-inference v1.4.3

The default Inference Endpoint image (“container-type”) appears to be pulled from a private repo (registry.internal.huggingface.tech/api-inference/community/text-generation-inference:gemma-ie), and reports its version as v1.4.1-native in its responses’ system_fingerprint property.

Attempting to specify a custom image pulled from the text-generation-inference github registry (Package text-generation-inference · GitHub) appears to be serving the wrong model (bigscience/bloom-560m, even though I specified TinyLlama/TinyLlama-1.1B-Chat-v1.0 when creating the Inference Endpoint).

Is there a generally accepted / recommended way I can serve a current text-generation-inference container (>= v1.4.3) using an Inference Endpoint?

moocow2txt · March 27, 2024, 7:13am

Update: The problem appears to have been on the Inference Endpoints side and now seems to have been resolved: Inference endpoints are now using ghcr.io/huggingface/text-generation-inference:1.4.4 as the default image.

Apologies for the noise.

Topic		Replies	Views
Deploying TinyLlama Model via SageMaker Inference Endpoint with Custom Setup Amazon SageMaker	0	449	April 7, 2024
About the Inference Endpoints on the Hub category Inference Endpoints on the Hub	3	1654	May 8, 2025
Inference Endpoints / Model choices / Help Inference Endpoints on the Hub	1	23	July 10, 2025
Deploying to Model Hub for Inference with custom tokenizer Beginners	1	624	January 1, 2022
🤗 LLM Inference Container for SageMaker Beginners	1	283	June 7, 2023

[RESOLVED] Recommended way to use guidance on an Inference Endpoint?

Related topics