I have a finetuned FlanT5 model and I’m trying to use it for inference with the
model.generate method. I’m inspecting the behaviour of decoding methods that alter the next-token probability distribution, specifically the
top_p parameter (for nucleus sampling) and the
temperature parameter. I was wondering what happens if I specify both
temperature? Will it first flatten the distribution with a high temperature and then obtain the nucleus of this flattened distribution (i.e., temperature, then nucleus)? Or will it obtain the nucleus and then use the temperature to flatten the distribution (i.e., nucleus, then temperature). Or something else (e.g., only use nucleus, and ignore temperature, or vice versa).