Hi,
I have a finetuned FlanT5 model and I’m trying to use it for inference with the model.generate
method. I’m inspecting the behaviour of decoding methods that alter the next-token probability distribution, specifically the top_p
parameter (for nucleus sampling) and the temperature
parameter. I was wondering what happens if I specify both top_p
and temperature
? Will it first flatten the distribution with a high temperature and then obtain the nucleus of this flattened distribution (i.e., temperature, then nucleus)? Or will it obtain the nucleus and then use the temperature to flatten the distribution (i.e., nucleus, then temperature). Or something else (e.g., only use nucleus, and ignore temperature, or vice versa).
Thank you!