Order of execution of Top-K, Top-P sampling along with temperature

Let’s say I use:

sample_outputs = model.generate(**model_inputs,max_new_tokens=40,do_sample=True,
    top_k=3,top_p=0.51, temperature=0.6, num_return_sequences=3,)

What is the order of execution in this one?
Looked at the code for labml.ai sampling example and it doesn’t make sense because when using Temperature with top-K or Top-P, it first uses Softmax and selects and then use the Sampler

In the Google Cloud Document, it says that you use Top-K then Filter with Top-P along with Temperature.

Let’s say your PROBABILITIES as t0→0.4 , t1→0.2, t2→0.2, t3→0.15, t4→0.05`

You use Top-K = 3 and now you have t0,t1,t2. Now you have 2 choices:

  1. Whether to apply Top-P = 0.51 and then you again Normalize
  2. Or Normalize first then Apply Top-P = 0.51 and then again Normalize

On top of it, if we use temperature = 0.6 do we apply it in beginning? If yeas, then it is different when we would have used on just Top-P because the distribution has been shifted already.

How does that work? Can someone please explain in terms of execution?


From the source here, function top_k_top_p_filtering, I believe it first does top-k sampling then top-p sampling.

1 Like