[RESOLVED] Recommended way to use guidance on an Inference Endpoint?

Greetings all,

I would like to create an Inference Endpoint using the “guidance” features of the text-generation-inference container. According to Guidance , those features were introduced in text-generation-inference v1.4.3

The default Inference Endpoint image (“container-type”) appears to be pulled from a private repo (registry.internal.huggingface.tech/api-inference/community/text-generation-inference:gemma-ie), and reports its version as v1.4.1-native in its responses’ system_fingerprint property.

Attempting to specify a custom image pulled from the text-generation-inference github registry (Package text-generation-inference · GitHub) appears to be serving the wrong model (bigscience/bloom-560m, even though I specified TinyLlama/TinyLlama-1.1B-Chat-v1.0 when creating the Inference Endpoint).

Is there a generally accepted / recommended way I can serve a current text-generation-inference container (>= v1.4.3) using an Inference Endpoint?

Update: The problem appears to have been on the Inference Endpoints side and now seems to have been resolved: Inference endpoints are now using ghcr.io/huggingface/text-generation-inference:1.4.4 as the default image.

Apologies for the noise.