Hi,
I am running inference on the following models: “NousResearch/Llama-2-7b-chat-hf” and “NousResearch/Llama-2-7b-hf” and “lmsys/vicuna-7b-v1.5”. To extract the text generated by each LLM, I use: model_response = sequences[0]['generated_text']
.
The response that I get for all three models always contains the input prompt AND the model’s generated text. Is there a way (through inference settings) that I can make it so that the text I get back is just the model’s generated text (not including the prompt)?