I am trying out “meta-llama/Llama-2-7b-chat-hf”
model_name = “meta-llama/Llama-2-7b-chat-hf”
pipeline = transformers.pipeline(“text-generation”,
model=model_name,
torch_dtype=torch.float16,
device_map=“auto”)
sequences = pipeline(prompt,
num_return_sequences=1,
temperature=5.0, top_p=1.0, top_k=0,
eos_token_id=tokenizer.eos_token_id,
max_length=1000)
I set T=5 that’s pretty high to ensure I get variable results. But curiously, I am getting the exact same output from the llama. I do believe my top_p and top_k are correctly set but could be wrong. anyone observed the same?