How can I obtain the logits via model.generate()?

xxxxxxk · October 7, 2024, 8:58pm

Hi, I am running the llama-3-chat-hf model, using the following codes:

model_name='meta-llama/Meta-Llama-3-8B-Instruct'

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

... # processing data

inputs = tokenizer(prompt, return_tensors='pt').to("cuda") # prompt is my instruction in natural language
output = model.generate(**inputs, max_new_tokens=40)

And the output is

tensor([[ 45147,  31868,  65562,  ...,   7566, 128009, 128001]],
       device='cuda:0')

This is what I want, but I also need the logits of the Llama-3 model. How can I do this? One solution is calling the model() method, but this method does not return the above output.

John6666 · October 8, 2024, 9:01am

How about this?

github.com/huggingface/transformers

Obtaining logits during model.generate()

opened 11:25PM - 14 Mar 24 UTC

closed 09:42AM - 16 Mar 24 UTC

QiyaoWei

### System Info - `transformers` version: 4.35.0 - Platform: Linux-6.2.0-39-ge…neric-x86_64-with-glibc2.35 - Python version: 3.10.9 - Huggingface_hub version: 0.17.3 - Safetensors version: 0.4.0 - Accelerate version: 0.25.0 - Accelerate config: not found - PyTorch version (GPU?): 2.1.0 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: yes - Using distributed or parallel set-up in script?: no ### Who can help? _No response_ ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below) ### Reproduction What I am trying to do is explained in the title. As far as I understand, there are two ways of doing this (1) pass in the output_logits=True parameter into model.generate() directly (2) forward pass the model on the generated textual output MWE here, is there something that I am not understanding correctly? ``` import random random.seed("1234") from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed, pipeline set_seed(1234) model_name="lvwerra/gpt2-imdb" model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda") tokenizer = AutoTokenizer.from_pretrained(model_name) tokenizer.pad_token = tokenizer.eos_token # I want to have the logits when I do generation encoding = tokenizer(["Hi there, how are you?"], return_tensors="pt").to("cuda") generation_output = model.generate(**encoding, return_dict_in_generate=True, output_logits=True) # To be clear, this method returns the logits of the generated tokens, which does not include the prompt, but that is a minor detail # obtain textual and logits output from gen sequences = generation_output.sequences sanity_check_logits = generation_output.logits # we already have our encoding, now we can look at the logits from our models dummy_var = {"input_ids": sequences} model_output = model(**dummy_var) # assert sanity_check_logits == model_output.logits fails ``` ### Expected behavior I expect the forward pass logits to be equal to the generated logits

xxxxxxk · October 8, 2024, 11:43am

Thank you, I just found the solution:

        output = model.generate(**inputs, max_new_tokens=40, return_dict_in_generate=True, output_scores=True)

        sequence = output.sequences
        scores = output.scores # logits of the generated tokens

Topic		Replies	Views
Llama model outputs strange words Beginners	0	133	December 1, 2024
Logits from generate and model call different 🤗Transformers	2	939	January 26, 2025
Can I get logits for each sequence I acqired from model.generate()? Beginners	1	1301	November 27, 2020
Argmax of Generation Probabilities doesn't match with Generated Sequence Tokens 🤗Transformers	2	948	May 10, 2024
Why does `generate` in `LlamaForCausalLM` give me _slightly_ lower logits than __call__? 🤗Transformers	1	157	September 5, 2024

How can I obtain the logits via model.generate()?

Related topics