Let’s say I use:
sample_outputs = model.generate(**model_inputs,max_new_tokens=40,do_sample=True, top_k=3,top_p=0.51, temperature=0.6, num_return_sequences=3,)
What is the order of execution in this one?
Looked at the code for labml.ai sampling example and it doesn’t make sense because when using Temperature with top-K or Top-P, it first uses Softmax and selects and then use the
In the Google Cloud Document, it says that you use Top-K then Filter with Top-P along with Temperature.
Let’s say your
PROBABILITIES as t0→0.4 , t1→0.2, t2→0.2, t3→0.15, t4→0.05`
Top-K = 3 and now you have
t0,t1,t2. Now you have 2 choices:
- Whether to apply
Top-P = 0.51and then you again Normalize
- Or Normalize first then Apply
Top-P = 0.51and then again Normalize
On top of it, if we use
temperature = 0.6 do we apply it in beginning? If yeas, then it is different when we would have used on just
Top-P because the distribution has been shifted already.
How does that work? Can someone please explain in terms of execution?