System Info
-
transformers
version: 4.20.1 - Platform: Linux-5.4.0-58-generic-x86_64-with-debian-buster-sid
- Python version: 3.7.12
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.10.1+cu111 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: False
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
-
An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
text = """
Phillip, Could you please do me a favor?\nI would like to read your current title policy to see what \
it says about easements.\nYou should have received a copy during your closing.\nI don't know how many \
pages it will be but let me know how you want to handle getting a copy made.\nI'll be happy to make the copy,\
or whatever makes it easy for you.\nThanks,\n
"""
checkpoint = "Aktsvigun/bart-base_aeslc_42"
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint).cuda()
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
input_ids = tokenizer(text, truncation=True, return_tensors="pt")["input_ids"].to(model.device)
generate_output = model.generate(
input_ids, num_return_sequences=4, length_penalty=1., return_dict_in_generate=True, output_scores=True, early_stopping=True
)
# Most probable labels according to the generate output. Taking from first since do not need initial generation token.
labels = generate_output.sequences[0][generate_output.sequences[0] != 1][None, 1:]
out = model(input_ids, labels=labels)
probas = torch.nn.functional.softmax(out.logits, dim=-1)
sequence_score = probas[0].log().gather(index=labels[0][:, None], dim=-1).sum() / len(labels[0])
assert torch.allclose(-sequence_score, out.loss)
assert torch.allclose(sequence_score, generate_output.sequences_scores[0])
Expected behavior
The last assert must be passed, yet the results differ (-0.8670 for reconstructed score and -0.8581 from generated output). What happens in the code: I first generate the sequence with BART, and then I try to reproduce the score by calling .forward
(reproducing the score as the average of log-probas of labels ids taken from each decoder iteration).
Why is it important: this is a “sub-bug” which I found, verifying another bug: I wrote a function to restore the sequences and sequences scores from transformers.generation_utils.BeamSearchEncoderDecoderOutput.scores
and got slightly different results with the ones outputted by transformers.generation_utils.BeamSearchEncoderDecoderOutput
. Namely, I restore some sequences with the scores, higher than transformers.generation_utils.BeamSearchEncoderDecoderOutput.sequences_scores
. I need to check, which version (default / mine) is correct, hence I need to pass the sequence with forward and calculate its “intrinsic” score. However, as this example shows, either .forward
or .generate
return slightly erroneous results.