I noticed that, in case that a model returns a tie in the logits for two different tokens, generate
does not always choose the lowest one.
input_ids = tensor([[128000, 16533, 279, 2768, 3488, 304, 264, 2478, 4339, 323, 449, 912, 37666, 13, 3639, 574, 279, 8250, 315, 578, 1050, 359, 2461, 315, 279, 21080, 555, 89315, 92898, 30, 578, 1050, 359, 2461, 315, 279, 21080, 555, 89315, 92898, 36513, 369]], device='cuda:0')
attention_mask = torch.ones_like(input_ids)
# model: LlamaForCausalLM
generated = model.generate(
input_ids = input_ids,
attention_mask = attention_mask,
max_new_tokens = 20,
min_new_tokens = 20,
do_sample = False,
temperature = None,
use_cache = False,
top_p = None,
top_k = None,
output_logits = True,
return_dict_in_generate = True,
)
logits = torch.stack(generated.logits, dim = 1)
logits[0][1][18] == logits[0][1][19] # True (20.8750 == 20.8750)
logits.argmax(dim = 2)[0][1] # 18 (chosen by torch).
generated.sequences[0][input_ids.shape[1] + 1] # 19 (chosen by HF generate).
What algorithm does generate
use for tie-breaking the highest logiat? I cannot find any documentation about this on the Generation page, and this is making my model give wrong results later on.
Note that there are no precision problems here: the two logits have the exact same value.