Truncated output on mistralai/Mistral-7B-Instruct-v0.1

romflorentz · December 20, 2023, 6:22pm

I am using an inference endpoint on mistralai/Mistral-7B-Instruct-v0.1. The output is truncated, for instance, the proposed test query "Can you please let us know more details about your " yields “2019 Honda CR-V Touring?\n\n1. What is the mile”. How can I adjust the output size?

radames · December 20, 2023, 7:23pm

Have you tried modifying the ‘max_tokens’ parameter

romflorentz · December 20, 2023, 8:06pm

No I haven’t, is this parameter documented somewhere ? Using the following yields the same result.

output = query({
  "inputs": "Can you please let us know more details about your ",
  "parameters": {
    "max_tokens": 128
  }
})

radames · December 20, 2023, 8:56pm

You need the correct prompt template for the model

<s>[INST] Can you please let us know more details about your [/INST]

I recommend using out python client plus prompt templates via transformers

Here is a code snippet example:

pip install transformers jinja2 huggingface-hub

from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import InferenceClient

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
client = InferenceClient("mistralai/Mistral-7B-Instruct-v0.1")

messages = [
    {"role": "user", "content": "Can you please let us know more details about your "},
]

prompt_encoded = tokenizer.apply_chat_template(messages, tokenize=False)
output = client.text_generation(prompt_encoded, max_new_tokens=200)
print(output)

Model can be an inference endpoint

model (str, optional) — The model to run inference with. Can be a model id hosted on the Hugging Face Hub, e.g. bigcode/starcoder or a URL to a deployed Inference Endpoint. Defaults to None, in which case a recommended model is automatically selected for the task.

romflorentz · December 21, 2023, 9:29am

Ok I see, it works now with the prompt template. Thanks!

Topic		Replies	Views
Text Generation response truncation Beginners	6	1351	August 18, 2024
Frienndly Reminder Models	0	251	March 4, 2024
Falcon-7b-instruct ALWAYS returns SHORT ANSWERS on inference endpoint Intermediate	1	907	September 5, 2023
Model output is cutoff Inference Endpoints on the Hub	4	3540	September 25, 2023
Inference API detailed request Beginners	5	2265	September 11, 2020

Truncated output on mistralai/Mistral-7B-Instruct-v0.1

Related topics