Here is my script:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
prompt = """
CONTEXT: Harvard University is a private Ivy League research university in Cambridge, Massachusetts.
Founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard, it is the oldest institution of higher learning in the United States. Its influence, wealth, and rankings have made it one of the most prestigious universities in the world.
QUESTION: Which year was Harvard University found?
"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
Here is the output:
<s>
CONTEXT: Harvard University is a private Ivy League research university in Cambridge, Massachusetts.
Founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard,
it is the oldest institution of higher learning in the United States. Its influence, wealth,
and rankings have made it one of the most prestigious universities in the world.
QUESTION: Which year was Harvard University found?
ANSWER: Harvard University was founded in 1636.</s>
How to prompt in order to get just the answer instead of repeating the input prompt?