When using autoregressive models like GPT-2, GPT-NEO for text generation. At each time step, can we extract other candidate tokens with their probabilities in top_k instead of the token with highest probability?
cc @patrickvonplaten, @lysandre
When using autoregressive models like GPT-2, GPT-NEO for text generation. At each time step, can we extract other candidate tokens with their probabilities in top_k instead of the token with highest probability?
cc @patrickvonplaten, @lysandre