The description for this param (for example in the mixtral config) states
The number of experts to root per-token, can be also interpreted as the `top-p` routing
I am not quite sure I understand this correctly. As this is an integer (set to two in the mixtral case) - why is it called top-p instead of top-k routing?