LLMs Return Prompt as Well as Generated Text

Hi,
I am running inference on the following models: “NousResearch/Llama-2-7b-chat-hf” and “NousResearch/Llama-2-7b-hf” and “lmsys/vicuna-7b-v1.5”. To extract the text generated by each LLM, I use: model_response = sequences[0]['generated_text'].

The response that I get for all three models always contains the input prompt AND the model’s generated text. Is there a way (through inference settings) that I can make it so that the text I get back is just the model’s generated text (not including the prompt)?

No, atleast not that I have seen. But you can just cut the prompt out of the generated text.

1 Like

Yes, currently generate() doesn’t allow returning only the generated text. That feature might be added during the generate refactor

1 Like