How can I obtain the logits via model.generate()?

Hi, I am running the llama-3-chat-hf model, using the following codes:

model_name='meta-llama/Meta-Llama-3-8B-Instruct'

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

... # processing data

inputs = tokenizer(prompt, return_tensors='pt').to("cuda") # prompt is my instruction in natural language
output = model.generate(**inputs, max_new_tokens=40)

And the output is

tensor([[ 45147,  31868,  65562,  ...,   7566, 128009, 128001]],
       device='cuda:0')

This is what I want, but I also need the logits of the Llama-3 model. How can I do this? One solution is calling the model() method, but this method does not return the above output.

1 Like

How about this?

Thank you, I just found the solution:

        output = model.generate(**inputs, max_new_tokens=40, return_dict_in_generate=True, output_scores=True)

        sequence = output.sequences
        scores = output.scores # logits of the generated tokens
1 Like