When using greedy decoding on a causal LM, how does `generate` handle tie-breaking between logits?

mfixman · September 5, 2024, 2:52pm

I noticed that, in case that a model returns a tie in the logits for two different tokens, generate does not always choose the lowest one.

input_ids = tensor([[128000,  16533,    279,   2768,   3488,    304,    264,   2478,   4339, 323,    449,    912,  37666,     13,   3639,    574,    279,   8250, 315,    578,   1050,    359,   2461,    315,    279,  21080,    555, 89315,  92898,     30,    578,   1050,    359,   2461,    315,    279, 21080,    555,  89315,  92898,  36513,    369]], device='cuda:0')
attention_mask = torch.ones_like(input_ids)

# model: LlamaForCausalLM
generated = model.generate(
	input_ids = input_ids,
	attention_mask = attention_mask,
	max_new_tokens = 20,
	min_new_tokens = 20,
	do_sample = False,
	temperature = None,
	use_cache = False,
	top_p = None,
	top_k = None,
	output_logits = True,
	return_dict_in_generate = True,
)
logits = torch.stack(generated.logits, dim = 1)

logits[0][1][18] == logits[0][1][19] # True (20.8750 == 20.8750)

logits.argmax(dim = 2)[0][1] # 18 (chosen by torch).
generated.sequences[0][input_ids.shape[1] + 1] # 19 (chosen by HF generate).

What algorithm does generate use for tie-breaking the highest logiat? I cannot find any documentation about this on the Generation page, and this is making my model give wrong results later on.

Note that there are no precision problems here: the two logits have the exact same value.

Topic		Replies	Views
Why does `generate` in `LlamaForCausalLM` give me _slightly_ lower logits than __call__? 🤗Transformers	1	157	September 5, 2024
Results of model.generate are different for different batch sizes of the decode-only model Beginners	6	6041	April 14, 2024
Inconsistency in logit values between generation and direct model prediction #31127 🤗Transformers	0	211	May 30, 2024
Argmax of Generation Probabilities doesn't match with Generated Sequence Tokens 🤗Transformers	2	947	May 10, 2024
Difference in trainer.predict() and model.generate() for LM 🤗Transformers	0	1782	July 5, 2023

When using greedy decoding on a causal LM, how does `generate` handle tie-breaking between logits?

Related topics