opened 11:25PM - 14 Mar 24 UTC
closed 09:42AM - 16 Mar 24 UTC
### System Info
- `transformers` version: 4.35.0
- Platform: Linux-6.2.0-39-ge…neric-x86_64-with-glibc2.35
- Python version: 3.10.9
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: 0.25.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.0 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
### Who can help?
_No response_
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)
### Reproduction
What I am trying to do is explained in the title. As far as I understand, there are two ways of doing this
(1) pass in the output_logits=True parameter into model.generate() directly
(2) forward pass the model on the generated textual output
MWE here, is there something that I am not understanding correctly?
```
import random
random.seed("1234")
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed, pipeline
set_seed(1234)
model_name="lvwerra/gpt2-imdb"
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
# I want to have the logits when I do generation
encoding = tokenizer(["Hi there, how are you?"], return_tensors="pt").to("cuda")
generation_output = model.generate(**encoding, return_dict_in_generate=True, output_logits=True)
# To be clear, this method returns the logits of the generated tokens, which does not include the prompt, but that is a minor detail
# obtain textual and logits output from gen
sequences = generation_output.sequences
sanity_check_logits = generation_output.logits
# we already have our encoding, now we can look at the logits from our models
dummy_var = {"input_ids": sequences}
model_output = model(**dummy_var)
# assert sanity_check_logits == model_output.logits fails
```
### Expected behavior
I expect the forward pass logits to be equal to the generated logits