Greetings all,
I would like to create an Inference Endpoint using the “guidance” features of the text-generation-inference
container. According to Guidance , those features were introduced in text-generation-inference v1.4.3
The default Inference Endpoint image (“container-type”) appears to be pulled from a private repo (registry.internal.huggingface.tech/api-inference/community/text-generation-inference:gemma-ie), and reports its version as v1.4.1-native
in its responses’ system_fingerprint
property.
Attempting to specify a custom image pulled from the text-generation-inference github registry (Package text-generation-inference · GitHub) appears to be serving the wrong model (bigscience/bloom-560m
, even though I specified TinyLlama/TinyLlama-1.1B-Chat-v1.0
when creating the Inference Endpoint).
Is there a generally accepted / recommended way I can serve a current text-generation-inference container (>= v1.4.3) using an Inference Endpoint?