Get top_k tokens for each time step instead of the highest probability token

When using autoregressive models like GPT-2, GPT-NEO for text generation. At each time step, can we extract other candidate tokens with their probabilities in top_k instead of the token with highest probability?
cc @patrickvonplaten, @lysandre

1 Like