Let’s say I use:
sample_outputs = model.generate(**model_inputs,max_new_tokens=40,do_sample=True,
top_k=3,top_p=0.51, temperature=0.6, num_return_sequences=3,)
What is the order of execution in this one?
Looked at the code for labml.ai sampling example and it doesn’t make sense because when using Temperature with top-K or Top-P, it first uses Softmax and selects and then use the Sampler
In the Google Cloud Document, it says that you use Top-K then Filter with Top-P along with Temperature.
Let’s say your PROBABILITIES
as t0→0.4 , t1→0.2, t2→0.2, t3→0.15, t4→0.05`
You use Top-K = 3
and now you have t0,t1,t2
. Now you have 2 choices:
- Whether to apply
Top-P = 0.51
and then you again Normalize - Or Normalize first then Apply
Top-P = 0.51
and then again Normalize
On top of it, if we use temperature = 0.6
do we apply it in beginning? If yeas, then it is different when we would have used on just Top-P
because the distribution has been shifted already.
How does that work? Can someone please explain in terms of execution?