Generate() returns full prompt plus answer

Hi all,

Some models, when generating text, return the prompt + the answer in the output. For instance mistral models and phi-2 have this behavior. Other models such as flan-t5 don’t do that, and the output only contains the generated text, without prepending the prompt. Is there a way to only return the generated text for e.g. mistral or phi-2? I’ve tried some solutions found online (e.g. tiiuae/falcon-40b-instruct · Model returns entire input prompt together with output suggest setting “include_prompt_in_result” or “return_full_text” to False, but these args don’t seem to exist).

Thanks in advance.

Hi @alexszen

You can exclude the prompt part ‘prompt_length’ from the model’s output, then you can pass only the answer to the decoder as follow:

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(input_ids=inputs['input_ids'])

prompt_length = inputs['input_ids'].shape[1]

answer = tokenizer.decode(outputs[0][prompt_length:])
2 Likes