I have been trying to obtain the logit values for each token output as follows. However, the output that I get is a vector with -inf
in all positions except at the position of the predicted token. This is observed when generation_config.do_sample = True
. When generation_config.do_sample = False
, the output that I get is a vector with definite values in it.
I am confused since I thought the values when do_sample=True
should be definite and not -inf
for any non-deterministic token sampling to work well.
Can someone please explain why this is happening?
Here is the code:
from transformers import AutoTokenizer, AutoModelForCausalLM
name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, device_map='auto')
model.generation_config.do_sample = True
model.generation_config.temperature = 0.6
sen = "[INST] What is Sun? [/INST]"
output = model.generate(**tokenizer(sen, return_tensors='pt').to(0), max_length=20, output_scores=True, return_dict_in_generate=True)
print(output.scores)
Output screenshot: