Truncated output on mistralai/Mistral-7B-Instruct-v0.1

I am using an inference endpoint on mistralai/Mistral-7B-Instruct-v0.1. The output is truncated, for instance, the proposed test query "Can you please let us know more details about your " yields “2019 Honda CR-V Touring?\n\n1. What is the mile”. How can I adjust the output size?

Have you tried modifying the ‘max_tokens’ parameter

No I haven’t, is this parameter documented somewhere ? Using the following yields the same result.

output = query({
  "inputs": "Can you please let us know more details about your ",
  "parameters": {
    "max_tokens": 128
  }
})

You need the correct prompt template for the model

<s>[INST] Can you please let us know more details about your [/INST]

I recommend using out python client plus prompt templates via transformers

Here is a code snippet example:

pip install transformers jinja2 huggingface-hub
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import InferenceClient

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
client = InferenceClient("mistralai/Mistral-7B-Instruct-v0.1")

messages = [
    {"role": "user", "content": "Can you please let us know more details about your "},
]

prompt_encoded = tokenizer.apply_chat_template(messages, tokenize=False)
output = client.text_generation(prompt_encoded, max_new_tokens=200)
print(output)

Model can be an inference endpoint

model (str, optional) — The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. bigcode/starcoder or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is automatically selected for the task.

Ok I see, it works now with the prompt template. Thanks!

1 Like