I deployed Mistral 7B Instruct v0.3 with TGI using the official docker container:
model=mistralai/Mistral-7B-Instruct-v0.3
# share a volume with the Docker container to avoid downloading weights every run
volume=$PWD/data
hf_token=<your_hf_api_token>
docker run --gpus all --shm-size 1g -e HF_TOKEN=$hf_token -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:2.2.0 --model-id $model --quantize bitsandbytes-nf4
I then created a HuggingFaceEndpoint and was able to interact with the Mistral model:
llm = HuggingFaceEndpoint(
endpoint_url="http://localhost:8080/",
max_new_tokens=512
)
llm.invoke("What is the capital of France?")
Working with the InferenceClient
from huggingface_hub
or directly via curl
or requests
works fine.
Then I created a ChatHuggingFace, so that I don’t have to manually add the special tokens, etc.:
llm_engine_hf = ChatHuggingFace(llm=llm, model_id="mistralai/Mistral-7B-Instruct-v0.3") # model_id can't be inferred from the endpoint_url
llm_engine_hf.invoke("Hugging Face is")
This fails because the json dict in the request does not have an inputs
field, but a messages
field.
The error description in the response is:
'Failed to deserialize the JSON body into the target type: missing field `inputs` at line 1 column 400'
Btw. the actual error message that is output in the console is not that helpful
huggingface_hub.utils._errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: http://localhost:8080/
What am I missing? Is there some parameter that I have to change when instantiating the HuggingFaceEndpoint
/ChatHuggingFace
or when deploying the model via TGI? I couldn’t find anything in the official guides.