-inf values for logit score outputs with model.generate

I have been trying to obtain the logit values for each token output as follows. However, the output that I get is a vector with -inf in all positions except at the position of the predicted token. This is observed when generation_config.do_sample = True. When generation_config.do_sample = False, the output that I get is a vector with definite values in it.

I am confused since I thought the values when do_sample=True should be definite and not -inf for any non-deterministic token sampling to work well.

Can someone please explain why this is happening?

Here is the code:

from transformers import AutoTokenizer, AutoModelForCausalLM

name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, device_map='auto')

model.generation_config.do_sample = True
model.generation_config.temperature = 0.6

sen = "[INST] What is Sun? [/INST]"

output = model.generate(**tokenizer(sen, return_tensors='pt').to(0), max_length=20, output_scores=True, return_dict_in_generate=True)    

Output screenshot:


Same issue! Could anyone please explain how to tackle this?